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BACKGROUND OF THE INVENTION 



This application claims the benefit of U.S. Provisional Application, S.N. 
60/105,052, filed October 21, 1998 and U.S. Provisional Application, S.N. 60/134,175, 
5 filed May 13, 1999. The government may own rights in the present invention pursuant to 
grant numbers DK-20595, DK-47486, and DK-47487 from United States Public Health 
Service. 

1. Field of the Invention 

10 The present invention relates generally to the field of treatment of diabetes 

mellitus. More particularly, it concerns methods of diagnosing a propensity for type 2 
diabetes mellitus, methods of identifying compounds to treat type 2 diabetes mellitus, and 
new nucleic acid sequences encoding polypeptides related to type 2 diabetes mellitus. 

# 

15 2. Description of Related Art 

Diabetes mellitus is a phenotypically and genetically heterogeneous group of 
metabolic diseases all of which are characterized by high blood glucose levels resulting 
from an absolute or relative deficiency of the hormone insulin (The Expert Committee on 
the Diagnosis and Classification of Diabetes Mellitus, 1997). The chronic hyperglycemia 

20 damages the eyes, kidneys, nerves, heart and blood vessels leading to blindness, kidney 
and heart disease, stroke, loss of limbs and reduced life expectancy. Diabetes mellitus is 
a major public health problem affecting more than 120 million people worldwide (King et 
aL, 1998). It has an enormous economic impact on society and the direct medical and 
indirect expenditures attributable to diabetes in 1997 in the United States alone were $98 

25 billion (American Diabetes Assoc., 1998). 

Genetics play an important role in the development of diabetes with some forms 
resulting from mutations in a single gene whereas others are oligogenic or polygenic in 
origin. The monogenic forms of diabetes may account for 5% of all cases of diabetes and 
30 have diverse causes. Diabetes can result from mutations in the insulin (Steiner et al. 9 
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1995) and insulin receptor genes (Taylor et aL, 1995) as well as the genes encoding the 
glycolytic enzyme glucokinase (Vionnet et aL, 1992) and the transcription factors 
hepatocyte nuclear factor-la (HNF-la), HNF-lp, HNF-4a and insulin promoter factor-1 
(IPF-1) (Yamagata et aL, 1996a; Horikawa et aL, 1997; Yamagata et aL, 1996b; Stoffers 
5 et aL, 1997). Mutations in these genes lead to impaired pancreatic p-cell function or in 
the case of the insulin receptor to defects in insulin action in target tissues including the 
pancreatic p-cell. In addition to these nuclear-encoded genes, mutations in maternally- 
inherited mitochondrial genes can cause diabetes and appear to do so primarily by 
impairing pancreatic p-cell function (Maassen and Kadowaki, 1996). 

10 

The two most common forms of diabetes, type 1 and type 2 diabetes, have a 
complex mode of inheritance. Type 1 diabetes is a common chronic disorder of children 
which accounts for about 5-10% of all diabetes. It results from the autoimmunological 
destruction of the insulin-producing cells of the pancreas leading to an absolute 
15 deficiency of insulin and requirement of insulin therapy for survival. Type 1 diabetes was 
the first genetically complex disorder to be studied by large-scale genome-wide screening 
for susceptibility genes and these studies showed the importance of the HLA region in 
determining susceptibility and revealed the locations of other loci with smaller effects on 
susceptibility (Davies et aL, 1994; Hashimoto et aL, 1994; Lernark and Ott, 1998). 

20 

Type 2 diabetes is the most common form of diabetes accounting for about 90% 
of all cases of diabetes and affecting 10-20% of those over 45 years of age in many 
developed countries. It is characterized by defects in insulin action resulting in decreased 
glucose uptake by muscle and fat and increased hepatic glucose production, and by 

25 abnormalities in the normal pattern of glucose-stimulated insulin secretion. Type 2 
diabetes results from the joint action of multiple genetic and environmental factors. 
Linkage studies have led to the localization of susceptibility genes for type 2 diabetes in 
Mexican Americans (Hanis et aL, 1996), in the linguistically-isolated Swedish-speaking 
population living in the Botnia region on the western coast of Finland (Mahtani et aL, 

30 1996), and in the Pima Indians of the southwestern United States (Pratley et aL, 1998). 
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Each study localized susceptibility to largely different regions of the genome suggesting 
that different combinations of susceptibility genes are responsible for type 2 diabetes in 
these various populations. 

5 Genome-wide screens for susceptibility genes for complex disorders have become 

de rigueur and genes for a number of different complex disorders have been successfully 
localized through linkage studies. Although disease genes for complex disorders can be 
localized through genetic studies, their identification still represents a major challenge if 
there are no candidates in the region of interest. This is due in part to the fact that 

10 recombination events cannot be used to unambiguously define the boundaries of the 
region containing the susceptibility locus because of heterogeneity within and between 
families. The location of a gene for a complex disorder is defined by a confidence 
interval which may be and often is quite large. The future of genetic studies of complex 
disorders depends on the ability to identify predisposing genes once they have been 

15 mapped. 

There are no examples of the successful identification of a gene for a complex 
disease originally mapped by linkage that can be used to guide such studies. It has been 
proposed that linkage disequilibrium mapping can be used to refine the localization and 
20 perhaps identify the disease locus (Spielman and Ewens, 1998). However, it is unclear 
how successful linkage disequilibrium mapping will be when only affected sibpairs are 
available for study as is the case for many common late-onset disorders such as type 2 
diabetes. 

25 Moreover, experience in identifying genes for complex disorders is so limited that 

it is not known whether the susceptibility is due to only one or a few variants or many. 
The presence of a large number of disease-associated variants would confound linkage 
disequilibrium studies. Thus, there is a need to provide an exemplary protocol for the 
identification of genes in complex disorder and further, there is a pressing need to identify 
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the elusive type-2 diabetes susceptibility gene. Despite the desirablity of these endeavors 
these needs remain unfulfilled. 

SUMMARY OF THE INVENTION 

5 In some aspects, the present invention relates to methods for screening for 

diabetes comprising: a) obtaining sample nucleic acid from an animal; and b) analyzing 
the nucleic acid to detect a polymorphism in a calpain-encoding nucleic acid segment or a 
protease-encoding nucleic segment; wherein detection of the polymorphism in the nucleic 
acid is indicative of a propensity for type 2 diabetes mellitus. In some cases, the nucleic 

10 acid is analyzed to detect a polymorphism in a cysteine protease-encoding nucleic acid. 
In some presently preferred methods, the nucleic acid is a calpain-encoding nucleic acid. 
The nucleic acid may encode a portion of a CAPN10 gene. For example, the nucleic acid 
may encode UCSNP-43 of the CAPN10 gene, wherein the G-allele has been determined 
to exist. In particularly preferred embodiments, the nucleic acid encodes a calpain 10 

15 polypeptide, for example: calpain 10a, calpain 10b, calpain 10c, calpain lOd, calpain lOe, 
calpain lOf, calpain lOg, or calpain lOh. The calpain-encoding nucleic acid segment or 
protease-encoding nucleic segment may be a DNA, for example a cDNA or genomic 
DNA. In preferred embodiments, the DNA comprises a gene for a calpain or protease. 
The nucleic acid may also be an RNA, for example, an mRNA encoding a calpain or 

20 protease. 

•In many cases, the methods of the invention will involve the step of analyzing the 
nucleic acid by sequencing the nucleic acid to obtain a sequence. The obtained sequence 
of the nucleic acid may then be compared to a known nucleic acid sequence of a calpain 
25 or protease gene to determine whether a polymorphism exists. In some preferred 
embodiments, the sequenced nucleic acid encodes a portion of a CAPN10 gene, for 
example, UCSNP-43 of the CAPN10 gene. In other embodiments, the sequenced nucleic 
acid encodes a calpain 10 polypeptide, for example, a calpain 10a, calpain 10b, calpain 
10c, calpain lOd, calpain lOe, calpain lOf, calpain lOg, or calpain lOh. In presently 
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preferred embodiments, the obtained sequence of the nucleic acid is analyzed to detect a 
presence or absence of the G-allele at UCSNP-43. 

Analysis of the nucleic acid for a polymorphism may comprise any of a number of 
5 standard molecular biological methods known to those of skill. For example, PCR, an 
RNase protection assay, or an RFLP procedure may be used. 

Presently preferred methods for screening for diabetes according to the above 
general methods comprise: a) obtaining sample nucleic acid from an animal; and b) 
10 analyzing the nucleic acid to detect a polymorphism in a calpain-encoding nucleic 
segment; wherein a polymorphism in the calpain-encoding nucleic acid is indicative of a 
propensity for type 2 diabetes mellitus. 

In other aspects, the invention relates to methods of regulating or preventing 
15 diabetes in an animal comprising the step of modulating calpain function in the animal. 
Such methods often further comprise diagnosing an animal with diabetes via analysis of a 
calpain-encoding nucleic acid sequence as described above. In anticipated preferred 
embodiments, the calpain-encoding sequence is a calpain 10-encoding sequence. 

20 Modulating calpain function may comprise providing a calpain polypeptide to the 

animal. The calpain polypeptide may be a native calpain polypeptide, for example, a 
native calpain 10 polypeptide. The native calpain 10 polypeptide may have an amino acid 
sequence as set forth in any of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID 
NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO: 14, SEQ ID NO:16, or SEQ ID 

25 NO: 18, and/or may be encoded by a nucleic acid as set forth in SEQ ID NO:l, SEQ ID 
NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:ll, SEQ ID NO:13, 
SEQ ID NO: 15, SEQ ID NO: 17, or SEQ ID NO: 19. The provision of a calpain 
polypeptide may be accomplished by inducing expression of a calpain polypeptide. For 
example, the expression of an calpain polypeptide encoded in the animal's genome may 

30 induced. Alternatively, the expression of a calpain polypeptide encoded by a nucleic acid 

6 

A: 230957(4Y7H0l!.DOO 



provided to the animal may induced. The provision of a calpain polypeptide may be 
accomplished by a method comprising introduction of a calpain-encoding nucleic acid to 
the animal. Alternatively, the provision of a calpain polypeptide may be accomplished by 
injecting the calpain polypeptide into the animal. In some cases, the modulation of 

5 calpain function in the animal comprises providing a modulator of calpain function to the 
animal. For example, the modulator of calpain function may be an agonist or antagonist 
of a calpain 10 polypeptide. Alternatively, the modulator of calpain function may 
modulate transcription and/or translation of a calpain 10-encoding nucleic acid. In many 
cases, modulation will only occur after a diagnosis that an animal has or is susceptible to 

10 diabetes via analysis of a calpain-encoding nucleic acid sequence for a polymorphism. 

In other aspects, the invention relates to methods of screening for modulators of 
calpain function comprising the steps of: a) obtaining an calpain polypeptide; b) 
determining a standard activity profile of the calpain polypeptide; c) contacting the 

15 calpain polypeptide with a putative modulator; and d) assaying for a change in the 
standard activity profile. Often, in such methods, the calpain polypeptide is a calpain 10 
polypeptide. The standard activity profile of the calpain 10 polypeptide may be 
determined by measuring the binding of the calpain 10 polypeptide to a synthetic 
substrate. An example of such a synthetic substrate is Suc-Leu-Tyr-AMC (Vilei et al 9 

20 1997). Frequently, obtaining the calpain polypeptide comprises expressing the 
polypeptide in a host cell. Although the calpain polypeptide may be isolated away from 
the host cell prior to contacting the calpain polypeptide with the putative modulator, in 
many assays known to those of skill in the art, it need not be. 

25 Preferred methods of screening for modulators of calpain function may comprise 

the steps of: a) obtaining a calpain-encoding nucleic acid segment; b) determining a 
standard transcription and translation activity of the calpain nucleic acid sequence; c) 
contacting the calpain-encoding nucleic acid segment with a putative modulator; d) 
maintaining the nucleic acid segment and putative modulator under conditions that 
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normally allow for calpain transcription and translation; and e) assaying for a change in 
the transcription and translation activity. 

The invention also relates to calpain modulators prepared by a process comprising 
5 screening for modulators as described above. 

The invention also relates to isolated and purified polynucleotides comprising a 
calpain 10-encoding sequence. Such polynucleotides may comprise, for example, a 
sequence encoding any of calpain 10a, calpain 10b, calpain 10c, calpain lOd, calpain lOe, 

10 calpain lOf, calpain lOg, calpain lOh, or mouse calpain 10. Such calpains may have an 
amino acid sequence as set forth in any of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, 
SEQ ID NO:8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, or SEQ 
ID NO: 18. The calpain 10-encoding polynucleotide may have a sequence as set forth in 
any of SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ 

15 ID NO: 1 1, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, or SEQ ID NO: 19. 

The invention also relates to isolated and purified calpain 10 polypeptides, for 
example, polypeptides forming calpain 10a, calpain 10b, calpain 10c, calpain lOd, calpain 
lOe, calpain lOf, calpain lOg, calpain lOh, or mouse calpain 10. Such polypeptides may 
20 have an amino acid sequence as set forth in any of SEQ ID NO:2, SEQ ID NO:4, SEQ ID 
NO:6, SEQ ID NO:8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, 
or SEQ ID NO: 18. 

The invention relates to method of obtaining a calpain 10 polypeptide comprising: 
25 a) obtaining a calpain 10 encoding-polynucleotide; b) inserting the obtained 
polynucleotide into a host cell; and c) culturing the host cell under conditions sufficient to 
allow production of the calpain 10-encoding polypeptide; wherein a calpain 10 
polypeptide is thereby obtained. The calpain 10 polypeptide may be any described above, 
and may be encoded by any calpain 10 encoding nucleotide described above. Such 
30 methods of obtaining calpain 10 polypeptides may comprise eventually isolating the 

8 
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calpain 10 polypeptide from the host cell, although this is not required for some 
applications. 

In some aspects, the invention relates to an isolated and purified polynucleotide 
5 comprising a sequence encoding the human G-protein coupled receptor as set forth in 
SEQ ID NO:21. The invention also relates to an isolated and purified polypeptide 
comprising the amino acid sequence of the human G-protein coupled receptor set forth in 
SEQ ID NO:20. 

10 The invention further concerns a method of modulating an insulin secretory 

response in an animal comprising the step of modulating calpain function in the animal. 
Modulating calpain function can be by providing a modulator of calpain function to the 
animal. The modulator can be an agonist or antagonist of a calpain polypeptide. In 
certain embodiments, the modulator may be an inhibitor of a calpain polypeptide. In 

15 preferred embodiments, the inhibitor inhibits calpain I and/or calpain II. The inhibitor 
may be calpeptin or calpain inhibitor 2 (N-Ac-Leu-Leu-methioninal, ALLM). 
Alternatively, the inhibitor may be a thiol protease inhibitor, such as E-64-d. 

The invention also concerns a method of modulating insulin mediated glucose 
20 transport in an animal comprising the step of modulating calpain function in the animal. 
Modulating calpain function can be by providing a modulator of calpain function to the 
animal. The modulator can be an agonist or antagonist of a calpain polypeptide. In 
certain embodiments, the modulator may be an inhibitor of a calpain polypeptide. In 
preferred embodiments, the inhibitor inhibits calpain I and/or calpain II. The inhibitor 
25 may be calpeptin or calpain inhibitor 2 (N-Ac-Leu-Leu-methioninal, ALLM). 
Alternatively, the inhibitor may be a thiol protease inhibitor, such as E-64-d. 

Other aspects of the invention concerns a method of increasing an insulin 
secretory response in an animal comprising the step of modulating calpain function in the 
30 animal. Modulating calpain function in the animal can be by providing a modulator of 
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calpain function to the animal. The modulator of calpain function can be an agonist or 
antagonist of a calpain polypeptide. The modulator may be a thiol protease inhibitor, 
such as E-64-d. 

5 The invention also concerns a method of treating diabetes in an animal comprising 

the step of modulating calpain function in the animal. Modulating calpain function can be 
by providing a modulator of calpain function to the animal. The modulator can be an 
agonist or antagonist of a calpain polypeptide. In certain embodiments, the modulator 
may be an inhibitor of a calpain polypeptide. In preferred embodiments, the inhibitor 

10 inhibits calpain I and/or calpain H The inhibitor may be calpeptin or calpain inhibitor 2 
(N-Ac-Leu-Leu-methioninal, ALLM). Alternatively, the inhibitor may be a thiol protease 
inhibitor, such as E-64-d. 

The invention further defines methods of treating diabetes by modulating the 
15 function of one or more calpains in at least one of a p-cell, muscle cell, or fat cell with a 
modulator of calpain function. Again, modulating calpain function can be by providing a 
modulator of calpain function to the animal. The modulator can be an agonist or 
antagonist of a calpain polypeptide. In certain embodiments, the modulator may be an 
inhibitor of a calpain polypeptide. In preferred embodiments, the inhibitor inhibits 
20 calpain I and/or calpain II. The inhibitor may be calpeptin or calpain inhibitor 2 (N-Ac- 
Leu-Leu-methioninal, ALLM). Alternatively, the inhibitor may be a thiol protease 
inhibitor, such as E-64-d. 

The methods for treating diabetes can be further defined as a method comprising 
25 inhibiting calpain activity in a p-cell with a modulator of calpain function, stimulating 
calpain activity in a muscle cell or fat cell with a modulator of calpain function, or a 
combination of these actions. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



The following drawings form part of the present specification and are included to 
further demonstrate certain aspects of the present invention. The invention may be better 
understood by reference to one or more of these drawings in combination with the 
detailed description of specific embodiments presented herein. 

FIG. 1. Alternative splicing of human calpain 10 mRNA generates a family 

of proteins. The patterns of alternative splicing and the organization of the calpain 10 
proteins generated by alternative splicing are shown. The four domains that define 
calpains are noted as are the amino acid residues that define the boundaries between 
domains. 

FIG. 2. Physical map of the NIDDM1 region of chromosome 2. This 

contig spans a region of about 1.7 Mb (259-266 cM of the genetic map) and is defined by 
73 STSs. SNPs (designated UCSNP-l-to-21) are numbered in the order in which they 
were identified and studied. 

FIG. 3. Organization of the NIDDM1 region. The 49,136 bp region (SEQ 

ID NO:l) that was sequenced is shown. The intron-exon organizations of the two genes 
found in the sequenced interval, CAPN10 and GPR35 are indicated. The locations of the 
SNPs typed in patients and controls are shown. The absolute distances between the two 
flanking genes GPC1 and ATSV and this region have not been determined precisely but 
are estimated to be <100 kb. VNTR-2 is estimated to be ~4 kb and consist of 100 or 
more copies of an imperfect 29 bp repeat (range 26-39 bp), the consensus sequence of 
which is TCTCAGAGTGGGGTGAGGCTGTGATGGGG (SEQ ID NO:29). This region 
is unstable and was deleted in the BAC and PAC clones that the inventors examined with 
M23N19 having only 12 repeats and p278G8 having only 3. VNTR-2 could not by typed 
by PCR™. VNTR-1 is a perfect 19 bp repeat that could be typed by PCR™. 
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FIG. 4. RNA blot showing expression of calpain 10 mRNA in human 
tissues. The positions of RNA size markers are shown on the left. 

FIG. 5. Alignment of the predicted amino acid sequence of human calpain 
5 10a with representative members of the large subunit calpain family. The four domains 
of the calpains are indicated. This alignment was generated with CLUSTAL X. rCAPN8 
(SEQ ID NO:27) and hCAPN9 (SEQ ID NO:28) denote nCL-2 and -4, respectively. The 
mouse and rat sequences for calpain 6 (mCAPNIO, SEQ ID NO:26) and calpain 8 
(rCAPNIO, SEQ ID NO:27) are shown. The GenBank accession numbers and sequence 
10 ID listings for the sequences shown here are: hCAPNl, X04366, SEQ ID NO:22; 
hCAPN2, M23254, SEQ ID NO:23; hCAPN3, X85030, SEQ ID NO:24; hCAPN5, 
Y10552, SEQ ID NO:25; mCAPN6, Y12582, SEQ ID NO:26; rCAPN8, D14479, SEQ 
ID NO:27; hCAPN9, AF022799, SEQ ID NO:28; and hCAPNIO, AF089088, SEQ ID 
NO:2. 

15 

FIG. 6. Unrooted phylogenetic tree of calpain large subunit family. 

Multiple sequence alignment was performed with CLUSTAL X. The phylogenetic tree 
was generated using the neighbor joining method based on the number of amino acid 
substitutions. Branch lengths are proportional to the inferred phylogenetic distances. The 
20 tree was drawn using TREE VIEW. 

FIG. 7A and FIG. 7B. FIG. 7 A. Interaction between NIDDMl and CYP19. 
Multipoint allele-sharing analysis of chromosome 15 weighted by the evidence for 
linkage at NIDDMl on chromosome 2. FIG. 7B. Interaction between NIDDMl and 
25 CYP19. Multipoint allele-sharing analysis of chromosome 2 weighted by the evidence 
for linkage at CYP19 on chromosome 15. 

FIG. 8A, FIG. 8B, FIG. 8C. and FIG. 8D. Effect of protease inhibitors 

on the insulin secretory response to glucose in mouse pancreatic islets. FIG. 8A. and FIG 
30 8B. Insulin secretion by mouse islets incubated in the presence of 2 mM glucose (open 

12 
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bars) and 20 mM glucose (hatched bars) in the absence and presence of 200 E-64-d 
(FIG. 8A) and 250 nM ALLM (FIG. 8B). Results are mean ± SEM of 6 studies in each 
case. *p<0.05 compared to islets incubated in 20 mM glucose in the absence of calpain 
inhibitors. FIG. 8C and FIG 8D. Effect of increasing concentrations (|iM) of E-64-d 
5 (FIG. 8C) and ALLM (FIG. 8D) on the insulin secretory response to 2 mM glucose (open 
bars) and 20 mM glucose (hatched bars). Results are mean ± SEM of 5-6 studies in each 
group. *p<0.05 compared to islets incubated in the absence of calpain inhibitors. 

FIG. 9A, FIG. 9B and FIG. 9C. Effect of protease inhibitors on the insulin 

10 secretory response to glucose and other secretagogues in mouse pancreatic islets. 
FIG. 9A Insulin secretion by islets incubated at various glucose concentrations in the 
absence (open bars) and presence (hatched bars) of 100 jiM ALLM. Results are mean ± 
SEM of 4-7 studies per group. *p<0.05 compared to islets incubated in the absence of 
ALLM. FIG. 9B. Insulin secretion by perifused islets in response to stimulation with 20 

15 mM glucose (6-20min, solid bar). The perifiisate contained 2 mM glucose except where 
shown. Islets were preincubated for 4 hr either in the absence of calpain inhibitors (■) or 
in the presence of 100 \M ALLM (•) or 200 E-64-d (A). In studies involving 
inhibitors, ALLM was present throughout the study but E-64-d which is an irreversible 
cysteine protease inhibitor was present only during the pre-incubation. Results are mean 

20 ± SEM of 3 studies in each group. FIG. 9C. Insulin secretion by mouse islets incubated 
in the presence of 2 mM glucose (2), 8 mM glucose (8), 250 carbachol (CCh) or 50 
nM GLP-1 (GLP-1) in the presence of 8 mM glucose and 30 mM KC1 in the presence of 
2 mM glucose (KC1), Islets were incubated either in the absence (open bars) or presence 
(hatched bars) of 100 \M ALLM. Results are mean ± SEM of 6 separate studies. 

25 *p<0.05 compared to islets incubated in the absence of ALLM. 

FIG. 10A and FIG. 10B. Measurements of membrane capacitance in isolated 
p-cells. Capacitance measurements reveal a large increase in insulin secretion after 
pretreatment with ALLM (100 \xM\ Using the perforated whole-cell recording 
30 configuration, p-cells were stimulated with a train of ten step depolarizations to +20 mV 

13 
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(HP = -80 mV). Each depolarization lasted 150 ms and was separated by 400 ms 
interpulse duration. FIG. 10A. Representative capacitance traces obtained from a control 
p-cell (top) and from a different cell pre-treated with ALLM (100|iM, bottom). 
FIG. 10B. Average peak change in membrane capacitance elicited by trains of 
5 depolarizations from control (open bar, n = 9) and ALLM pre-treated cells (hatched bar, 
n = 1 1). Data are mean ± SEM, * indicates p<0.05. 

FIG. 11A, FIG. 11B, FIG. 11C, FIG. 11D, FIG. HE and FIG. 11F. Effect 
of protease inhibitors on [Ca 2+ ]i , whole cell calcium currents and NAD(P)H responses to 

10 glucose in mouse islets. FIG. 11A and FIG. 11B. [Ca 2+ ]i responses to 14 mM glucose 
(open bar), washout (2 mM glucose) and stimulation with 30 mM KC1 in the continued 
presence of 2 mM glucose (filled bar). 340/380 ratio is an indirect measure of 
intracellular free calcium ([Ca 2+ ]i ) Islets were preincubated for 4 hr either in the absence 
(FIG. 11 A) or presence (FIG. 11B) of 100 jxM ALLM inhibitor-2. Similar results were 

15 obtained with E-64-d. FIG. 1 1C. ALLM pre-treatment did not alter whole-cell calcium 
currents recorded in p-cells. Representative calcium currents recordings obtained from a 
control cell (0.1% DMSO, top) and from a different cell pre-treated with ALLM (100 
fiM, bottom) are shown. FIG. 1 ID. The average peak calcium current density (peak 
current divided by cell size) for control (left, open bar, 34.2 ± 2.2 pA/pF, n = 18) and for 

20 ALLM (100 \xM) pre-treated cells (right, hatched bar, 36.7 ± 3.9 pA/pF, n = 15). 
FIG. HE and FIG. 11F. Changes in NAD(P)H fluorescence in response to stimulation 
with 14 mM glucose (open bar) in mouse islets. Islets were preincubated for 4 hr either 
in the absence (FIG. 1 IE), or presence (FIG. 1 IF) of 100 |iM ALLM inhibitor-2. 

25 FIG. 12A and FIG. 12B. Effects of protease inhibitors on calpain activity in 

islets. FIG. 12A Mouse islets were preincubated for 4 hr either in the absence of 
inhibitors (■) or in the presence of 200 jiM ALLM (•) or 200 ^M E-64-d (A). Islets 
were then incubated in KRB containing 10 ^iM Boc-Leu-Met-CMAC from 0 min and 
fluorescence emitted by the calpain proteolytic product was measured following 

30 excitation by light at 340 nM. Data represent mean ± SEM of 3-4 separate experiments. 
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FIG. 12B The area under the curve (AUC) of fluorescence generation in the absence of 
calpain inhibitors (open bar, n=4) and in the presence of ALLM (hatched bar, n=3) and E- 
64-d (solid bar, n=4) were compared. *p<0.05, compared to islets incubated in the 
absence of inhibitor. 

5 

FIG. 13A, FIG. 13B and FIG. 13C. Effects of protease inhibitors on 

insulin action in adipocytes and skeletal muscle. FIG. 13 A. Effects of insulin alone (■) 
or insulin in the presence of 100 ^iM ALLM (•) or 200 i)M E-64-d (A) on 2- 
deoxyglucose uptake into rat adipocytes. Insulin concentrations (nmol/L) are shown on 

10 the horizontal axis. * denotes p< 0.05 compared to cell incubated in the absence of 
insulin. FIG. 13B. Effect of ALLM (100 ^iM) and E-64-d ^i200 (M) on 2-deoxyglucose 
uptake by skeletal muscle. Soleus muscle strips from normal adult male rats were 
incubated in the absence (open bars) or presence (hatched bars) of 12 nM insulin and in 
the absence (control) and presence of protease inhibitors as shown. Results are mean ± 

15 SEM of 5 separate studies. # p<0.05 compared to muscles incubated in the absence of 
inhibitor. FIG. 6c. Effect of ALLM and E-64-d on glycogen synthesis rates in skeletal 
muscle. Muscle strips were incubated in the absence (open bars) or presence (hatched 
bars) of 6 nM insulin and in the absence (control) and presence of inhibitor as shown. 
Results are mean ± SEM of 6 separate studies. * p<0.05 compared to muscles incubated 

20 in the absence of insulin, # p<0.05 compared to muscles incubated in the absence of 
inhibitor. 

FIG. 14. Effect on glucose stimulated insulin secretion by islets following 
48 hour exposure to calpain inhibitors, ALLM or E64-d. 

25 

FIG. 15. Insulin content of islets following 48 hour exposure to calpain 
inhibitors, ALLM or E64-d. 

FIG. 16. Dose response of glucose stimulated insulin secretion by calpain 
30 inhibitor II. 

15 
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FIG. 17. Long term inhibitory effects of calpain inhibitors on glucose 
stimulated insulin secretion of perfused islets. 

5 FIG. 18. Recovery of normal glucose stimulated insulin secretion after two 

days in islets following 48 hour calpain inhibitor treatment. 

FIG. 19. Stimulated insulin secretory response to glyceraldehyde, keto- 
isocaproic acid (KIC) and KC1 following 48 hour calpain inhibitor treatment. 

10 

FIG. 20. Stimulated insulin secretory response to mastoparan and carbachol 
following 48 hour calpain inhibitor treatment. 

FIG. 21. Intracellular free calcium responses to glucose, KIC and KC1 in 
15 islets following 48 hour calpain inhibitor treatment. 

FIG. 22. Glucose metabolism in islets following 48 hour calpain inhibitor 
treatment. 

20 FIG. 23. NAD(P)H autofluorescence changes in response to glucose or KIC 

in islets following 48 hour calpain inhibitor treatment. 

FIG. 24. Lower rates of exocytosis in beta cells following 4 day calpain 
inhibitor treatment. 

25 

FIG. 25. Enlarged acidic vesicles in beta cells following 48 hour calpain 
inhibitor treatment. 

FIG. 26. Residual calpain activity in intact islets following 48 hour calpain 
30 inhibitor treatment. 
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DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 



Despite the fact that it has been known for many decades that a failure in an 
absolute or relative deficiency of the hormone insulin lead to diabetes, the genetic basis of 
susceptibility to diabetes remains elusive. Diabetes is a major cause of health difficulties 
in the United States. Type 2 diabetes mellitus (also referred to as non-insulin-dependent 
diabetes mellitus - NIDDM) is a major public health disorder of glucose homeostasis 
affecting about 5% of the general population in the United States. The causes of the 
fasting hyperglycemia and/or glucose intolerance associated with this form of diabetes are 
not well understood. 

Type 2 diabetes has onset in mid-life or later. This disorder or maturity-onset 
diabetes of the young (MODY) shares many features with the more common form(s) of 
type 2 diabetes the onset of which occurs in mid-life. Maturity-onset diabetes of the 
young (MODY) is a form of diabetes mellitus that is characterized by an early age at 
onset, usually before 25 years of age, and an autosomal dominant mode of inheritance 
(Fajans, 1989). Except for these features, the clinical characteristics of patients with 
MODY are similar to those with the more common late-onset form(s) of type 2 diabetes. 
The genes for susceptibility to MODY have been characterized and described in WO 
98/11254, which is the PCT counterpart to U.S. Patent Application 08/927,219, filed 
September 9, 1997. These documents are incorporated herein by reference in their 
entirety, as providing disclosure of diagnostic and prognostic aspects of MODY. 

Type 2 diabetes results from the joint action of multiple genetic and 
environmental factors. Linkage studies have led to the localization of susceptibility genes 
for type 2 diabetes in Mexican Americans, in the linguistically-isolated Swedish-speaking 
population living in the Botnia region on the western coast of Finland, and in the Pima 
Indians of the southwestern United States. Each study localized susceptibility to largely 
different regions of the genome suggesting that different combinations of susceptibility 
genes are responsible for type 2 diabetes in these various populations. 
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The genome-wide search for type 2 diabetes genes in the Mexican Americans 
community of Starr County, Texas, localized a major susceptibility locus, N1DDM1, to 
the region of D2S125-D2S140 (multipoint lod score = 4.03, P=8.2xl0~ 6 ). The inventors' 
5 results and those of others indicate that NIDDM1 has a less important role in determining 
diabetes susceptibility in non-Hispanic white (German, French, Sardinian, British and 
Finnish) and Asians (Japanese) populations than it does in Mexican Americans (Hanis et 
aL, 1996; Mahtani et aL, 1996; Hanis et aL, 1997; Thomas et aL, 1997; Ciccarese et aL, 
1997; Gosh etaL, 1998). 

10 

The inventors' strategy for positionally cloning NIDDM1 was designed to 
capitalize on linkage disequilibrium, if it is present, but still recognize disease-associated 
variation in its absence by utilizing information on the interaction between NIDDM1 and 
other susceptibility loci. Here, the inventors demonstrate that it is possible to positionally 
15 clone a gene for a complex disorder solely on the basis of its map position using standard 
molecular genetic methods coupled with novel analytic techniques. The inventors show 
that NIDDM1 encodes a novel calpain-like cysteine protease that the inventors have given 
the name diabetes calpain or "diapain." This result defines a new pathway leading to the 
development of type 2 diabetes. 

20 

In order to determine whether evidence that the presence of NIDDM1 is associated 
with increased risk for the development of type 2 diabetes in a predisposed population 
could be detected, 106 Mexican American subjects from Starr County, Texas, were 
selected, each of whom had at least two first degree relatives with type 2 diabetes but 
25 none of whom had a personal history of previously diagnosed diabetes. The inventors 
found strong physiological evidence for an important role of this gene as a primary cause 
of type 2 diabetes. More particularly, the present invention shows that that there are a 
combination of pathophysiological defects (insulin resistance, impaired glucose tolerance 
and defective insulin secretion) in subjects who are homozygous GG at UCSNP-43 prior 
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to the onset of overt type 2 diabetes. These results are briefly discussed herein below and 
discussed in further detail in the Examples. 

The inventors used oral glucose tolerance testing to monitor pathophysiological 
5 abnormalities associated with NIDDML This is a standard test used to measure the 
response of islet cells to a glucose bolus and is currently recognized as the test in most 
wide-spread use for diabetes detection. The normal range of fasting glucose 
concentrations is 110 mg/dl. Following glucose ingestion glucose concentrations 
increase. The threshold value that defines normal glucose tolerance is below 140 mg/dl, 
10 any individual having a glucose concentration value above this threshold is defined, by 
WHO criteria, as having impaired glucose tolerance. 

Using subjects possessing a family history of diabetes who do not have diabetes 
themselves but who are homozygous GG at UCSNP-43, the inventors were able to 
demonstrate a number of abnormalities by oral glucose tolerance testing. First, these 
individuals demonstrate fasting hyperinsulinemia suggesting the presence of insulin 
resistance. Second, these individuals were shown to have elevated average plasma 
glucose concentrations 120 min. after ingestion of 75 g glucose orally to within a range 
that defines impaired glucose tolerance a condition widely recognized to be associated 
with a significant increased risk for the subsequent development of type 2 diabetes. 
Further, these individuals characteristically have reduced insulin concentrations 30 min. 
after ingestion of 75 g glucose. Reduced insulin concentrations in response to the oral 
ingestion of nutrients is one of the hallmarks of type 2 diabetes. A similar defect is 
therefore present in subjects homozygous GG at UCSNP-43 even before the onset of 
diabetes. 

Thus, the present invention concerns the early detection, diagnosis, prognosis and 
treatment of type 2 diabetes. The present invention describes for the first time a sequence 
and mutations in a diapain gene responsible for type 2 diabetes susceptibility. The specific 
30 mutation and identity of the corresponding wild-type genes from diabetic subjects, are 
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disclosed. These mutations are indicators of type-2 related diabetes and are diagnostic of 
the potential for the development of diabetes. It is envisioned that the techniques disclosed 
herein will also be used to identify other gene mutations responsible for other forms of 
diabetes. 

5 

Those skilled in the art will realize that the nucleic acid sequences disclosed will 
find utility in a variety of applications in diabetes detection, diagnosis, prognosis and 
treatment. Examples of such applications within the scope of the present invention include 
amplification of markers of NIDDM using specific primers; detection of markers of diapain 

10 by hybridization with oligonucleotide probes; incorporation of isolated nucleic acids into 
vectors and expression of vector-incorporated nucleic acids as RNA and protein; 
development of immunologic reagents corresponding to gene encoded products; and 
therapeutic treatment for the identified type 2 diabetes using these reagents as well as anti- 
sense nucleic acids or other inhibitors specific for the identified type 2 diabetes. The 

15 present invention further discloses screening assays for compounds to upregulate gene 
expression or to combat the effects of the calpain 10 gene(s). 

A. Diabetes 

Diabetes mellitus affects approximately 5% of the population of the United States 
20 and over 120 million people worldwide (King et aL, 1998, Harris et aL 9 1992). A better 
way of identifying the populace who are at risk of developing diabetes is needed as a 
subject may have normal plasma glucose compositions but may be at risk of developing 
overt diabetes. These issues could be resolved if it were possible to diagnose susceptible 
people before the onset of overt diabetes. This is presently not possible with subjects 
25 having classical diabetes due to its multifactorial nature. 

The clinical characteristics that are seen in patients with type 2 diabetes include 
frequent severe fasting hyperglycemia, the need for oral hypoglycemic agents, eventual 
insulin requirements, and vascular and neuropathic complications (Fajans et a/., 1994; 
30 Menzel^a/., 1995). 
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The number of genes and allelic variants that influence the development of a 
complex trait such as type 2 diabetes is uncertain. The inventors' studies in the Mexican 
American population of Starr County, Texas, indicate that type 2 diabetes in this group 
5 results from the interactions of a major gene that accounts for about 35% of the familial 
clustering with perhaps as many as 11 loci of smaller effect (Hanis et al. 9 1996). The 
inheritance of type 2 diabetes in this population appears to be oligogenic with interactions 
between 2-3 loci in each individual being the primary determinant of susceptibility. 

10 While linkage studies have shown that it is possible to map susceptibility genes, 

the identification of the gene and nucleotide variants that influence susceptibility is a 
forbidding task and one for which there is no precedent. Here, the inventors have shown 
that it is possible to find the one gene in 100,000 and the one nucleotide in 3,300,000,000 
v3 that affect susceptibility using positional cloning strategies employed for the 

tn 15 identification of mutations in single-gene disorders together with novel analytic 
techniques. The implications for studies of complex disorders are clear. 

- " Genetic studies of disorders such as type 2 diabetes which have onset in 

H; middle-age pose a number of challenges. The late age at onset (the mean age at diagnosis 

fU 20 for men and women in the inventors' study population was 50.0 ±11.6 and 48.7 ±10.7 
□ (mean ± SD) years, respectively) makes it difficult to identify complete nuclear families. 

One or both parents are often not available, in part because of the early mortality 
associated with diabetes, and the children of affected individuals have not yet developed 
the condition. The inventors have used affected sib-pairs without parents in their studies 
25 which limits the types of analyses that can be used to assess linkage disequilibrium to 
comparisons of allele frequencies between cases and controls from the same community. 
While this may not be considered optimal, it did not hinder the inventors' search for 
NIDDM1 and it is unclear that the search would have proceeded differently even if 
nuclear families had been available thus permitting the use of robust tests of linkage 
30 disequilibrium such as the transmission/disequilibrium test (Spielman and Ewens, 1998). 
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The identification of NIDDM1 proceeded simultaneously with the generation of a 
physical map, the boundaries of which were defined by the 1-lod confidence interval, the 
identification of diallelic polymorphisms, usually SNPs, in both ESTs and STSs, and tests 

5 of linkage and association of these polymorphisms, and haplotypes formed from adjacent 
polymorphisms, with type 2 diabetes. The results of each analysis refined the focus of the 
inventors' search finally identifying a G-to-A polymorphism (UCSNP-43) as being 
responsible for all the evidence for linkage with type 2 diabetes. This polymorphism acts 
in a recessive manner with individuals homozygous for the high frequency G-allele being 

10 at increased risk of developing diabetes. 

The inventors identified 166 DNA polymorphisms during the course of this study 
of which 62 were typed in at least 100 affected and 100 random controls usually by DNA 
sequencing. In addition, the inventors resequenced a 50 kb region in ten individuals to 

15 ensure that they had identified all common variants in the region and could exclude each 
as being the basis of the evidence of linkage with type 2 diabetes. No other 
polymorphism exhibited the magnitude of effect attributable to the variation at 
UCSNP-43. It seems unlikely that the effects at UCSNP-43 are due to chance and that 
NIDDM1 is another of the SNPs examined that account for substantially less of the 

20 evidence of linkage. It is also unlikely that the actual variant lies outside the region that 
the inventors have sequenced. Such a variant would be more than 6,225 bp from 
UCSNP-43 and it would have to be in strong linkage disequilibrium with SNP-43 and 
have a similar frequency. Such a combination is unlikely given the admixture of the 
Mexican American population - the alleles would have to have been present at similar 

25 frequencies in both major founder populations. Nor do the inventors believe that the 
original evidence for linkage was a false positive and they have merely localized that 
false positive signal to a single polymorphism. This is unlikely since UCSNP-43 
accounts for all the evidence for linkage not only in the original data set but also in a 
smaller replicate sample. The results strongly suggest that UCSNP-43 is the variant 

30 responsible for the effects attributed to NIDDM1 . 
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The nucleotide variant responsible for all the evidence for linkage with type 2 
diabetes is located in the gene encoding a novel calpain, calpain 10. The role of this 
novel calpain, in diabetes is discussed at length herein below. 

5 

Existing Diabetes Therapies 

Sulfonylureas exert hypoglycemic action and inhibit potassium channel transport 
by binding to proteins at the potassium channel. Of the compounds commonly known as 
sulfonylureas, glyburide is considered the most potent because it binds most firmly and 
10 for a longer time to the 140 kda protein at the potassium channel of all tissues of the 
body. Micronized glyburide or small particle glyburide is absorbed more rapidly from the 
gastrointestinal tract than non-micronized glyburide. 

Oral hypoglycemic agents such as tolazamide, tolbutamide, chlorpropamide, 
15 micronized and non-micronized glyburide, glimepiride, glypizide, metformin, and 
phenformin have been available as oral treatments for diabetes, typically non-insulin 
dependent (Type II) diabetes. Oral hypoglycemic agents in general are disadvantageous 
because the extent , predictability and duration of the antidiabetic effect is unpredictable 
and these agents are often characterized by primary or secondary failure. Because oral 
20 hypoglycemic agents exhibit inconsistent hypoglycemic benefit, insulin therapy is 
preferred. 

For those diabetics in which current oral medication does not offer sufficient 
control of their condition, insulin injections are necessary. Daily injections offer a number 
25 of risks, including hypoglycemia, wide fluctuations in glucose concentrations requiring 
multiple daily serum glucose determinations and multiple insulin injections, and strict 
dietary control which then leads to the issue of poor compliance. Other disadvantages 
include difficulty in self administration of an accurate dose, especially by the elderly or 
infirmed patients. Epidemiological data shows that over 85% of insulin treated diabetics 
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in the United States are poorly controlled. As a result, 150 billion dollars per year is spent 
treating the devastating complications of the illness. 



10 



Some patients are virtually impossible to treat with insulin because their cells 
cannot effectively utilize or are resistant to insulin therapy. As a result of the lack of 
glycemic control, diabetic patients often experience a variety of conditions including: 
neuropathy, nephropathy, cardiomyopathy, fetinopathy, coronary and peripherovascular 
disease and the like. These complications occur due to the unachieved glycemic control 
that results from failure of the insulin, diet and/or exercise only approach. 



Diabetes refers to a disease process derived from multiple causative factors and 
characterized by elevated levels of plasma glucose or hyperglycemia. Uncontrolled 
hyperglycemia is associated with increased and premature mortality due to an increased 
risk for microvascular and macrovascular diseases, including nephropathy, neuropathy, 
15 retinopathy, hypertension, stroke, and heart disease. Therefore, control of glucose 
homeostasis is a critically important approach for the treatment of diabetes. 

Type I diabetes (IDDM) is the result of an absolute deficiency of insulin, the 
hormone which regulates glucose utilization. Type n, noninsulin dependent diabetes 
20 mellitus (NIDDM) is due to a profound resistance to insulin stimulating or regulatory 
effect on glucose and lipid metabolism in the main insulin-sensitive tissues, muscle, liver 
and adipose tissue. This resistance to insulin responsiveness results in insufficient insulin 
activation of glucose uptake, oxidation and storage in muscle and inadequate insulin 
repression of lipolysis in adipose tissue and of glucose production and secretion in liver. 

25 

The several treatments for NIDDM, which has not changed substantially in many 
years, are all with limitations. While physical exercise and reductions in dietary intake of 
calories will dramatically improve the diabetic condition, compliance with this treatment 
is very poor because of well-entrenched sedentary lifestyles and excess food 
30 consumption, especially high fat-containing food. Increasing the plasma level of insulin 



24 

A: 230957(4 Y7H0H.DOC) 



by administration of sulfonylureas (e.g. tolbutamide, glipizide) which stimulate the 
pancreatic .beta.-cells to secrete more insulin or by injection of insulin after the response 
to sulfonylureas fails, will result in high enough insulin concentrations to stimulate the 
very insulin-resistant tissues. However, dangerously low levels of plasma glucose can 
5 result from these last two treatments and increasing insulin resistance due to the even 
higher plasma insulin levels could theoretically occur. The biguanides increase insulin 
sensitivity resulting in some correction of hyperglycemia. However, the two biguanides, 
phenformin and metformin, can induce lactic acidosis and nausea/diarrhea, respectively. 

10 Thiazolidinediones (glitazones) are a recently disclosed class of compounds that 

are suggested to ameliorate many symptoms of NDDDM. These agents increase insulin 
sensitivity in muscle, liver and adipose tissue in several animal models of N1DDM 
resulting in complete correction of the elevated plasma levels of glucose, triglycerides 
and nonesterified free fatty acids without any occurrence of hypoglycemia. However, 

15 serious undesirable effects have occurred in animal and/or human studies including 
cardiac hypertrophy, hemadilution and liver toxicity resulting in few glitazones 
progressing to advanced human trials. 

Biguanide drugs, while not used in this country, are being tested in clinical trials 
20 as hypoglycemic agents (Katzung p. 598-599). Likewise, pioglitazone is being tested in 
clinical trials as a hypoglycemic agent (Hoffman and Colca, Diabetes Care, 15:1075- 
1078, 1992; Koybayashi et al., Diabetes, 41:476-483, 1992. Although these agents are 
being tested to evaluate usefulness in decreasing insulin resistance, no mechanism has 
been described to explain how they exert their effects. As has been found with 
25 sulfonylureas, the bignanides and pioglitazone may be found to be ineffective in a large 
percentage of patients, or the effectiveness of the agents may decline with longterm use. 
New therapeautic agents to decrease insulin resistance need to be identified and brought 
to clinical practice. 
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B. Calpain 

Calpain is a calcium-activated neutral protease, also known as CAPN; EC 
3.4.22.17. It is an intracellular cysteine protease which is ubiquitously expressed in 
mammalian tissues (Aoki et aU 1986). Calpain has been implicated in many degenerative 
5 diseases including, but not limited to, neurodegeneration (Alzheimer's disease, 
Huntington's disease, and Parkinson's disease), amyotrophy, stroke, motor neuron 
damage, acute central nervous system (CNS) injury, muscular dystrophy, bone resorption, 
platelet aggregation, and inflammation. 

Mammalian calpain, including human calpain, is multimeric. It consists of two 
different subunits, which are a 30 kDa subunit and an 80 kDa subunit, and, therefore, is a 
heterodimer. There are two forms of calpain, calpain I (um-calpain, umCAPN) and 
calpain II (m-calpain, mCAPN), which differ in their sensitivities to the concentration of 
calcium necessary for activation. Calpain I requires only low micromolar concentrations 
of calcium for activation, whereas calpain II requires high micromolar or millimolar 
levels (Aoki et al, 1986; DeLuca et aL, 1993). The same 30 kDa subunit is common to 
both forms. The two human calpains differ in the sequences of the DNA encoding their 
80 kDa subunit, sharing 62% homology. There is evidence that the 80 kDA subunit is 
inactive, but that it is autolyzed to a 76 kDa active form in the presence of calcium 
(Zimmerman et aL, 1991). The large catalytic subunit can be divided into four domains: 
domain I, the N-terminal regulatory domain that is processed upon calpain activation; 
domain II, the protease domain homologous to papain; domain m, a linker domain of 
unknown function; and domain IV, the calmodulin-like Ca 2+ -binding domain. 

25 Calpain Inhibitors 

Commercially available in vitro inhibitors of Calpain include peptide aldehydes 
such as leupeptin (Ac-Leu-Leu-Arg-H), as well as epoxysuccinates such as E-64. These 
compounds are not useful in inhibiting Calpain in vivo because they are poorly membrane 
permeant. Also, many of these inhibitors are poorly specific and will inhibit a wide 

30 variety of proteases in addition to Calpain. These commercially available compounds are 
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based upon peptide structures that are believed to interact with the substrate binding site 
of Calpain. Active groups associated with the Calpain inhibitors then either block or 
attack the catalytic moiety of Calpain in order to inhibit the enzyme. 

5 In addition, other types of compounds thought to possess in vitro, Calpain 

inhibitory activity that are not commercially available have been reported. Several 
classes of calpain inhibitors have been identified and found to provide protection against 
a variety of neurodegenerative diseases and conditions (Bartus et al., WO 92/11850). 
Other examples of calpain inhibitors include the peptide diazomethanes (Rich, 1986). 
10 These peptide diazomethanes are similarly thought to be poorly membrane permeant and 
non-specific. 

Calpeptin is another calpain inhibitor (Tsujinaka, et al, 1988). It was created by 
modifying the N-terminal of Leu-norleucinal or Leu-methioninal to obtain a cell 
15 penetrative peptide inhibitor against calpain. Calpeptin is a potent synthetic inhibitors in 
terms of preventing the Ca2+-ionophore induced degradation of actin binding protein and 
P235 in intact platelets. 

Calpain inhibitor 2 (N-Ac-Leu-Leu-methioninal, ALLM) has been used in a 
20 number of studies looking at its effects on normal cellular physiology. These include 
secretion from isolated rat alveolar epithelial cells (Zimmerman et al, 1995), muscle cell 
differentiation (Ueda et al, 1998), apoptosis in embryonic chicken neurons (Villa et al, 
1998), and extralysosomal proteolysis in cells, such as what occurs following cellular 
injury (Posmantur et al, 1997; Figueiredo-Pereira et al, 1994). Calpain inhibitor 2 
25 preferentially inhibits milli (m)-calpain, while calpain inhibitor 1 (N-acetyl-leucyl-leucyl- 
norleucinal) preferentially inhibits micro (mu)-calpain. 

There is some evidence that certain particular inhibitors of Calpain have certain 
therapeutic utilities. For example, leupeptin can facilitate nerve repair in primates. 
30 Loxastatin (also known as EST, Ep-460 or E-64d), a derivative of E-64, is believed to 
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have utility in the treatment of muscular dystrophy. E-64d, while not having significant 
protease inhibitory activity itself, is believed to be converted to more potent forms, such 
as to E-64c, inside a mammalian body. E-64 is commercially available from CalBiochem 
(San Diego, Calif.), Sigma Chemical Co. (St. Louis, Mo.), and Boehringer Mannheim 
5 (Indianapolis, Ind.). Other acceptable thiol protease inhibitors include analogs of E-64 
(Hashida et al, 1980; Barrett et al; Hanada et al, 1983) and the reversible protease 
inhibitor, leupeptin (Umezawa, 1976). 

Calpastatin 

10 Endogenous protein inhibitors of calpains, called calpastatins, are heat-stable 

polypeptides with high specificity for calcium-dependent proteinases. Calpastatins are 
essential factors in the in vivo regulation of CAPN activity, and perturbations of this ratio 
of inhibitor to enzyme in non-neural tissues have the predicted consequences on CAPN 
activity in cells. 

15 

Calpastatins, the specific protein inhibitors of CAPN, are also widely distributed 
among tissues. First identified in 1978 (Waxman et al, 1978), calpastatins have since 
been purified from several different sources. Although each of the purified species shares 
the properties of heat stability and strict specificity for CAPN, there is no consensus on 

20 the number of forms of calpastatin within single cells or among different cell types. The 
recent characterization of a calpastatin cDNA isolated from a rabbit cDNA library (Emori 
et al, 1987) revealed a deduced sequence of 718 amino acid residues (M.sub.r =76,964) 
containing four consecutive internal repeats of approximately 140 amino acid residues, 
each expressing inhibitory activity (Emori, et al, 1987). This deduced molecular weight is 

25 significantly lower than the molecular weight of rabbit skeletal muscle calpastatin 
(M.sub.r =1 10,000), suggesting that the inhibitor migrates anomalously on SDS gels and 
may be post-translationally modified. 

Other studies suggest that additional molecular forms of calpastatin may be 
30 present in tissues. Although 1 10 kDa calpastatin is observed in rabbit and bovine skeletal 
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muscle (Nakamura et al, 1985; Otsuka et al, 1987), porcine cardiac muscle (Takano et 
al, 1986) and human liver (Imajoh et al, 1984), other molecular forms of calpastatin 
have also been isolated, including a 68 kDa form from chick skeletal muscle (Ishiura et 
al, 1982) and porcine erythrocytes (Takano et al, 1986), a 50 kDa heterodimer from 

5 rabbit skeletal muscle (Nakamura et al, 1984) and 34 kDa forms from rabbit skeletal 
muscle (Takahashi-Nakamura et al, 1981) and rat liver (Yamato et al, 1983). The 
sensitivity of calpastatin to proteolysis has suggested that smaller polypeptide chains 
containing inhibitory activity might be derived from larger precursors during purification, 
or in vivo. Although certain of these low molecular weight calpastatins resemble the 

10 higher molecular weight forms, their derivation from the same gene product has not been 
established. 

Calpastatin which is a specific inhibitory protein as to calpain, is known, and is 
expected to be applicable as an effective therapeutic agent for various excessive calpain- 
15 related syndromes. Calpastatin is, however, a high molecular weight protein and hence it 
will be difficult to use as a medicine. 

Calpain 10 is a Novel Calpain-like Protease. 

Calpain 10 is a "diapain" that has been identified by the present invention. 
20 Calpain 10 is encoded by a sequence in a 49, 136 base pair region located on chromosome 
2 (SEQ ID NO:l). The following list shows the exon regions of this 49,136 base pair 
region that are differentially spliced to create mRNAs encoding different isomers of 
calpain 10. Nucleotide positions (nt) are shown relative to SEQ ID NO: 1 . 

Exon 1 nt 1235 - 1515 (cds 1375 - 1515) 
25 Exon 2 nt 3813 -3944 

Exon 3 nt 5283 - 5479 or Exon 3* 5283 - 5468 
Exon 4 nt 6401 -6618 
Exon 5 nt 8373 -8514 
Exon 6 nt 9010- 9175 (TGA, 9013-9015) 
30 Exon 7 nt 9491 -9771 
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Exon 8 nt 10400 - 10618 (TG A, 10455-10457) 
Exon 9 nt 10785 - 10987 

Exon 10 nt 1 1 147 - 1 1408 or Exon 10* 11316-1 1408 

Exon 11 nt 12354 - 12553 (TGA, 12412-12414) 

Exon 12 nt 12818- 12863 

Exon 13 nt 13117 - 13569 (TAA, 13144-13146) 

Exon 14 nt 30857 - 30980 

Exon 15 nt 31446 - 32175 (TGA, 31466-31468) 

There are a number of calpain 10 isoforms that result from alternative splicing of 
the CAPN10 gene. Alternative splicing generates eight related but structurally distinct 
proteins. The structures of the mRNAs encoding each isoform are defined by unique 
combinations of exons and splice donor and acceptor sites (see Table 1, FIG. 1). 



Table 1. Description of calpain 10 isoforms. 



Calpain 10 Isoform 


Encoded by Exons 


Polypeptide Length (aa) 


SEQ ID NO 


Calpain 10a 


1-7, 9-13 


672 


2 


Calpain 10b 


1-7, 9, 10*, 11-13 


544 


4 


Calpain 10c 


1-7, 11-13 


517 


6 


Calpain lOd 


1-7,9,11-13 


513 


8 


Calpain lOe 


1-10*, 11-13 


444 


10 


Calpain lOf 


1-3*, 4-7, 9-13 


274 


12 


Calpain lOg 


1, 2, 14, 15 


139 


14 


Calpain lOh 


1, 11-13 


138 


16 



There is a G/A polymorphism at nt 6225 of (relative to SEQ ID NO: 1) in intron 3 
of the Calpain 10 gene that is responsible for the evidence of linkage with type 2 diabetes. 
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Further, there is a GPR35 (G-protein coupled receptor most closely related in sequence to 
the human ATP receptor subtype P2Y9, amino acid sequence shown as SEQ ID NO.20 
and nucleic acid sequence shown as SEQ ID NO:21). There is a polyadenylation signal 
that is defined by nucleotides 43195 - 44927 of Exon 1 (relative to SEQ ID NO:l). The 
5 coding sequence between nucleotides 43390 - 44574 (inc. TAA, found in SEQ ED NO:l) 
yields a 394 amino acid protein. 

Calpain 10 diapain is an atypical calpain and is similar in structural organization 
to the other atypical calpains, calpain 5 and calpain 6, in that it has domains I-to-HI, lacks 

10 the calmodulin-like Ca 2+ -binding domain and has a divergent C-terminal domain, domain 
T (Dear et a/., 1997). Calpains 5, 6 and 10 define a distinct subfamily (FIG. 6). Calpains 
are found in all tissues and are processing proteases, cleaving specific substrates at a 
limited number of sites, and causing activation or inactivation of protein function. They 
have been implicated in the regulation of a variety of cellular functions including 

15 intracellular signaling, proliferation and differentiation, and may be responsible for 
insulin-induced down-regulation of insulin receptor substrate- 1 (Smith et aL, 1996), a key 
mediator of insulin action. 

Mutations in calpain 3, p94, are the cause of the recessive disorder limb-girdle 
20 muscular dystrophy type 2A, indicating a vital role for proteolysis in the determining 
normal muscle functional (Richard et a/., 1995). Calpains have also been implicated in 
sexual development in Caenorhabditis elegans and mutations in the sex determination 
gene tra-3, the orthologue of human calpain 5, affect correct sexual development of the 
soma and germ cells (Barnes and Hodgkin, 1996). 

25 

The results of the present invention indicate that a single nucleotide 
polymorphism in intron 3 of the calpain 10 gene affects diabetes susceptibility. The 
location of the causal variant within an intron suggests that it might function as an 
enhancer affecting regulation of transcription, or perhaps by its effects on alternative 
30 splicing. Diabetes results from defects in insulin secretion and insulin action. Calpain 10 
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is ubiquitously expressed and thus could affect both processes or, alternatively, have 
specific effects on muscle, liver and pancreatic 0-cell, the three most important tissues 
controlling glucose homeostasis. An understanding of the role of calpain 10 in diabetes 
must await the identification of the cell types sensitive to calpain 10 activity and its 
specific substrates. 

The present invention reveals a new regulatory network involved in the 
pathophysiology of this diabetes. This network likely includes, in addition to calpain 10, 
its substrates, inhibitors and activators. Calpain 10 does not appear to act alone in 
determining susceptibility to type 2 diabetes but rather interacts with the product of a 
gene on chromosome 15. The inventors have shown that NIDDM1 acts in concert with 
an unknown gene on chromosome 15 to increase susceptibility to type 2 diabetes in 
Mexican Americans, and this combination may be a primary determinant of susceptibility 
in 45% of families in this community. 

The gene product on chromosome 15 could be a substrate, inhibitor or activator of 
calpain 10. Given that the present invention has identified the sequence of calpain 10, the 
compositions of the present invention will allow one of skill in the art to identify the gene 
product on chromosome 15 that has long been sought after as a gene involved in diabetes. 

Furthermore, the identification of the causal variant at NIDDM1 also allows the 
inventors to re-examine the linkage studies in other populations. The G-allele at 
UCSNP-43 has a frequency in unrelated nondiabetic non-Hispanic whites (German 
ancestry), Asians (Japanese) and African Americans of 0.71, 0.94 and 0.90, respectively. 
Its high frequency in Asians and African Americans indicates that its effects on 
susceptibility may not be detected by linkage analysis. Its effect on susceptibility in non- 
Hispanic whites needs to re-evaluated taking into account the interaction with the 
diabetes gene on chromosome 15. 
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Calpain 10 is the third example of a protease contributing to the development of 
diabetes. Mutations in prohormone-processing carboxypeptidase E and prohormone 
convertase-1 are associated with a diabetes and obesity (Naggert et al. 9 1995; Jackson et 
al, 1997). The mutation in the carboxypeptidase E gene is responsible for impaired 

5 glucose tolerance or diabetes in a mammalian animal model system. The 
carboxypeptidase E gene product is known to cleave C-terminal amino acid residues from 
substrate proteins, and is a principal enzyme involved in the processing of precursor 
forms of peptide hormones into their mature, biologically active forms. The B-chain of 
insulin, immediately following the excision of the connector (C-) peptide from the 

10 proinsulin precursor by endopeptidase action, is carboxypeptidase E substrate. 
Carboxypeptidase E activity is required to remove a diarginyl remnant of the C-peptide at 
the C-terminus of the insulin .beta, chain. Without such removal, the C-terminal extended 
form has only a fraction of the activity of the processed form. Further, a defect 
carboxypeptidase E activity leads to an accumulation of proinsulin which also has low 

15 biological activity. 

Given the great diversity of proteases and the myriad functions they perform, 
additional proteases may be implicated in diabetes susceptibility. In this regard, it has 
been noted that one of the side effects of the long-term use of protease inhibitors in 
20 patients with AIDS is diabetes (Flexner, 1998). Since it is a variant in the calpain 10 
gene that is associated with diabetes, the inventors suggest that the protein encoded by 
this gene be called diapain-1 (diabetes cal pain) . As such the terms "calpain 10" and 
diapain-1 are used interchangeably herein. 

25 It is likely that additional diapain-1 -like proteases may be identified that are 

intrinsically involved in diabetes. As discussed above, there are numerous isoforms of 
calpain 10 that are formed as a consequence of alternative splicing of calpain 10 mRNA 
as described above. These include calpain 10a, calpain 10b, calpain 10c, calpain lOd, 
calpain lOe, calpain lOf, calpain lOg, and calpain lOh, as described in Table 1 and FIG. 1. 

30 Additional alternative splicing may provide other calpain 10 proteins. Further, it is 
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contemplated that other calpain or calpain-like proteins will be identified that are 
involved in the development of diabetes or any other manifestation of diabetes. Given the 
recent findings with regard to different factors involved in the regulation of expression 
and activity of the HNF transcription factors which are responsible for susceptibility to 
5 MODY1, MODY3 and MODY5 (WO 98/11254), it is likely that another such pathway 
may be defined for type 2 diabetes, with calpain 10 being one of the key factors in the 
pathway. From the inventors' investigations, it is conceivable that aberrations at any 
point along such pathway or any factors affecting the pathway directly or indirectly will 
result in P-cell dysfunction and diabetes mellitus, either as type 2 diabetes, another 
10 manifestation of type 2 diabetes or perhaps in diabetes as a whole (Le., type 1 and type 2 
diabetes). 

With respect to calpains, or indeed proteases in general, being involved in diabetic 
states other than type 2 diabetes, it is of note that one of the side effects of the long-term 

15 use of protease inhibitors in patients with AIDS is diabetes (Flexner, 1998). Thus, it is an 
aspect of the present invention to contemplate therapeutic strategies that provide 
amelioration of a diabetes-type phenotype by providing therapies that alleviate an 
aberration in protease gene expression, protein activity or function. These therapies may 
be based on gene therapy to provide wild-type proteases, or may employ modulators of 

20 proteases (calpains and diapains) identified according to the present invention. Such 
modulators may be small molecule inhibitors, antibody compositions or any other 
composition that will alleviate, overcome or otherwise circumvent the deleterious 
effects of protease mutations in diabetes. 

25 C. Linkage Analysis of Increased Susceptibility to Type 2 Diabetes 

In one aspect of the present invention, the inventors describe an approach to 
assessing the evidence for statistical interactions between unlinked regions that allows 
multipoint allele-sharing analysis to take the evidence for linkage at one region into 
account in assessing the evidence for linkage over the rest of the genome. Using this 
30 method, the inventors show that the interaction of genes on chromosomes 2 (NIDDM1) 
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and 15 (near CYPJ9) makes a major contribution to susceptibility to type 2 diabetes in 
Mexican Americans from Starr County, Texas. 

The correlation in scores assessing the evidence for linkage across families 

5 (e.g., non-parametric linkage scores - NPL (Kruglyak et al y 1996)) can be used to 
determine preliminary evidence for statistical interaction between unlinked regions. 
Unless the regions chosen for study actually contain loci which contribute susceptibility 
to disease, there is no expectation that NPL scores from unlinked regions will be 
correlated, even if the regions are selected because they show some evidence for linkage. 

10 However, there is not always a simple correspondence between the biological interactions 
of genes and the statistical interactions that can be detected. For example, while some 
models of epistatic interaction generate positive correlations between NPL scores from 
the regions to which the interacting loci map, many models of biological interaction 
would not generate detectable correlations. Moreover, negative correlations between 

15 regions can be generated when non-overlapping sets of families provide evidence for 
linkage due to genetic heterogeneity, in the absence of biological interactions between the 
susceptibility loci from these regions. Thus, finding significant correlations between NPL 
scores at unlinked regions provides additional evidence that loci from those regions 
contribute to disease susceptibility and generates insight into the models most consistent 

20 with the type of correlation (positive or negative) observed. 

Once preliminary studies provide evidence for statistical interaction between 
regions, it is possible to incorporate linkage evidence from one region in assessing 
evidence for linkage at a second region (or multiple regions) by weighting families 

25 according to their evidence for linkage. The multipoint allele-sharing approach described 
by Kruglyak et al. (1996) and extended by Kong and Cox (1997) to efficiently utilize 
incomplete information was designed to allow families to be weighted individually, but 
these original implementations assigned each family equal weight. The inventors' newest 
extension (GENEHUNTER-PLUS v2.0) allows users to specify individual weights for 

30 each family based, for example, on pedigree structure, number of affecteds, and/or their 
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evidence for linkage at a particular location. Family-specific weighting can be used to 
model positive interactions (such as epistasis) by assigning weight 0 to families with 0 or 
negative linkage scores and weight 1 to families with positive linkage scores (weighto-i), 
or to model heterogeneity by assigning weight 1 to families with negative linkage scores 
5 and weight 0 to families with 0 or positive linkage scores (weighti. 0 ). More complex 
family-specific weights proportional to the evidence for linkage (weight prop ) can also be 
constructed. 

Determining the significance of apparent interactions requires care. The nominal 
10 P-values associated with the sample correlations are calculated using the Pearson's 
correlation test (a f-test), which is likely to be appropriate for large sample sizes. The 
significance associated with the increased lod when evidence for linkage at a particular 
location is taken into account using family-specific weights can be determined either by 
%3 simulation, or by using a conservative % 2 test with one degree of freedom as follows. If 

?n 15 the inventors consider a more general one-degree-of-freedom family of weights in which 
^ weighto-i, and weight i-o, are the two extremes, then the increase over baseline of the MLS 

% 4 for the family weighting yielding the maximum load multiplied by 2 log(10) is 

a asymptotically distributed as a % 2 with one degree of freedom under the null hypothesis of 

" no interaction. The test is conservative because the inventors are not actually maximizing 

rU 20 the lod with respect to the weighting factors, and currently consider only a few family- 
q specific weights. However, interpretation of such studies still requires taking multiple 

*~~ comparisons into account. 

To limit the Bonferroni adjustment, it seems prudent to focus on the top signals 
25 from the primary linkage analysis and perhaps a small number of candidate regions. Even 
with this adjustment, such secondary analyses may increase the overall false positive rate 
became they are designed to strengthen the support for regions that do not themselves 
meet genome-wide criteria for significance. Given that, and the absence of information 
on the a priori likelihood of such interactions, it is appropriate to use more stringent 
30 criteria for determining significance, i.e. 0.01 instead of 0.05. The evidence for 
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interaction between the CYP19 - NIDDM1 regions meets these criteria after the 
Bonferroni adjustment where that between NIDDM1 and HNF-la does not (Table 6). 
More research will be necessary to determine whether such statistical interactions will be 
common in complex traits, and how criteria that have been suggested for assessing 
5 genome-wide significance (Lander and Kruglyak, 1995) should be modified when the 
evidence for linkage at multiple susceptibility loci is considered simultaneously. 
Example 5 herein describes the data generated from linkage between CYP19 and 
NIDDML 

10 D. Nucleic Acids 

As described in the Examples, the present invention discloses the calpain 10 gene at 
the NIDDM1 locus of chromosome 2. Mutations in this gene are responsible for 
susceptibility to type 2 diabetes. The gene at this locus has been designated as a calpain-like 
protein, calpain 10 or otherwise referred to herein as diapain-1. In particular, the nucleotide 
15 variant showing all the evidence for linkage with type 2 diabetes, UCSNP-43, is located 
in intron 3 of the calpain 10 gene, CAPN10 (see FIG. 4), 746 bp downstream of the splice 
donor site and 176 bp upstream of the splice acceptor site. The molecular mechanism by 
which the G-to-A polymorphism at UCSNP-43 affects susceptibility to type 2 diabetes is 
unclear. As shown in FIG. 5, there is alternative splicing of intron 3. 

20 

In one embodiment of the present invention, the nucleic acid sequences disclosed 
herein find utility as hybridization probes or amplification primers. In certain embodiments, 
these probes and primers consist of oligonucleotide fragments. Such fragments should be 
of sufficient length to provide specific hybridization to an RNA or DNA sample extracted 
25 from tissue. The sequences typically will be 10-20 nucleotides, but may be longer. Longer 
sequences, e.g., 40, 50, 100, 500 and even up to full length, are preferred for certain 
embodiments. 

Nucleic acid molecules having contiguous stretches of about 10, 15, 17, 20, 30, 40, 
30 50, 60, 75 or 100 or 500 nucleotides from a sequence selected from the group listed in in 

37 



A: 230957(4 V7H0U.DOC) 



SEQ ID N0:1, SEQ ID N0:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID N0:9, SEQ ID 
NO: 11, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID 
NO:21, fragments thereof, mRNAs and cDNAs encoding any of calpains lOa-lOh, or any 
other calpain 10, and mutants of each are contemplated. Molecules that are complementary 
5 to the above mentioned sequences and that bind to these sequences under high stringency 
conditions also are contemplated. SEQ ID NO:21 is the human G protein coupled receptor 
within the NIDDM1 region. SEQ ID NO: 19 is the mouse calpain 10 protease. These 
probes will be useful in a variety of hybridization embodiments, such as Southern and 
northern blotting. In some cases, it is contemplated that probes may be used that hybridize 
10 to multiple target sequences without compromising their ability to effectively diagnose 
diabetes and in particular, type 2 diabetes. In certain embodiments, it is contemplated that 
multiple probes may be used for hybridization to a single sample. 

Various probes and primers can be designed around the disclosed nucleotide 
15 sequences. Primers may be of any length but, typically, are 10-20 bases in length. By 
assigning numeric values to a sequence, for example, the first residue is 1, the second 
residue is 2, etc., an algorithm defining all primers can be proposed: 

n to n + y 

20 

where n is an integer from 1 to the last number of the sequence and y is the length of 
the primer minus one, where n + y does not exceed the last number of the sequence. Thus, 
for a 10-mer, the probes correspond to bases 1 to 10, 2 to 1 1, 3 to 12 ... and so on. For a 15- 
mer, the probes correspond to bases 1 to 15, 2 to 16, 3 to 17 ... and so on. For a 20-mer, the 
25 probes correspond to bases 1 to 20, 2 to 21, 3 to 22 ... and so on. 

The value of n in the algorithm above for the nucleic acid sequence is n = 49, 136 for 
the calpain 10 gene. The value of n for a cDNA encoding any of calpains 10a- lOh may be 
calculated by adding up the number of nucleic acids in the exons that are spliced to form the 
30 mRNA from which the particular calpain 10 is expressed. 
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The use of a hybridization probe of between 17 and 100 nucleotides in length allows 
the formation of a duplex molecule that is both stable and selective. Molecules having 
complementary sequences over stretches greater than 20 bases in length are generally 

5 preferred, in order to increase stability and selectivity of the hybrid, and thereby improve the 
quality and degree of particular hybrid molecules obtained. One will generally prefer to 
design nucleic acid molecules having stretches of 20 to 30 nucleotides, or even longer 
where desired. Such fragments may be readily prepared by, for example, directly 
synthesizing the fragment by chemical means or by introducing selected sequences into 

10 recombinant vectors for recombinant production. 

Accordingly, the nucleotide sequences of the invention may be used for their ability 
to selectively form duplex molecules with complementary stretches of genes or RNAs or to 
provide primers for amplification of DNA or RNA from tissues. Depending on the 
15 application envisioned, one will desire to employ varying conditions of hybridization to 
achieve varying degrees of selectivity of probe towards target sequence. 

For applications requiring high selectivity, one will typically desire to employ 
relatively stringent conditions to form the hybrids, e.g., one will select relatively low salt 

20 and/or high temperature conditions, such as provided by about 0.02 M to about 0.10 M 
NaCl at temperatures of about 50°C to about 70°C. Such high stringency conditions tolerate 
little, if any, mismatch between the probe and the template or target strand, and would be 
particularly suitable for isolating specific genes or detecting specific mRNA transcripts. It 
is generally appreciated that conditions can be rendered more stringent by the addition of 

25 increasing amounts of formamide. 

For certain applications, for example, substitution of nucleotides by site-directed 
mutagenesis, it is appreciated that lower stringency conditions are required. Under these 
conditions, hybridization may occur even though the sequences of probe and target strand 
30 are not perfectly complementary, but are mismatched at one or more positions. Conditions 
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may be rendered less stringent by increasing salt concentration and decreasing temperature. 
For example, a medium stringency condition could be provided by about 0.1 to 0.25 M 
NaCl at temperatures of about 37°C to about 55°C, while a low stringency condition could 
be provided by about 0.15 M to about 0.9 M salt, at temperatures ranging from about 20°C 
5 to about 55°C. Thus, hybridization conditions can be readily manipulated depending on the 
desired results. 

In other embodiments, hybridization may be achieved under conditions of, for 
example, 50 mM Tris-HCl (pH 8.3), 75 mM KC1, 3 mM MgCl 2 , 1.0 mM dithiothreitol, at 
10 temperatures between approximately 20°C to about 37°C. Other hybridization conditions 
utilized could include approximately 10 mM Tris-HCl (pH 8.3), 50 mM KC1, 1.5 mM 
MgCl 2 , at temperatures ranging from approximately 40°C to about 72°C. 

In certain embodiments, it will be advantageous to employ nucleic acid sequences of 
15 the present invention in combination with an appropriate means, such as a label, for 
determining hybridization. A wide variety of appropriate indicator means are known in the 
art, including fluorescent, radioactive, enzymatic or other ligands, such as avidin/biotin, 
which are capable of being detected. In preferred embodiments, one may desire to employ a 
fluorescent label or an enzyme tag such as urease, alkaline phosphatase or peroxidase, 
20 instead of radioactive or other environmentally undesirable reagents. In the case of enzyme 
tags, colorimetric indicator substrates are known that can be employed to provide a 
detection means visible to the human eye or spectrophotometrically, to identify specific 
hybridization with complementary nucleic acid-containing samples. 

25 In general, it is envisioned that the hybridization probes described herein will be 

useful both as reagents in solution hybridization, as in PCR, for detection of expression of 
corresponding genes, as well as in embodiments employing a solid phase. In embodiments 
involving a solid phase, the test DNA (or RNA) is adsorbed or otherwise affixed to a 
selected matrix or surface. This fixed, single-stranded nucleic acid is then subjected to 

30 hybridization with selected probes under desired conditions. The selected conditions will 
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depend on the particular circumstances based on the particular criteria required (depending, 
for example, on the G+C content, type of target nucleic acid, source of nucleic acid, size of 
hybridization probe, etc.). Following washing of the hybridized surface to remove non- 
specifically bound probe molecules, hybridization is detected, or even quantified, by means 
5 of the label. 

It will be understood that this invention is not limited to the particular probes 
disclosed herein and particularly is intended to encompass at least nucleic acid sequences 
that are hybridizable to the disclosed sequences or are functional analogs of these 
10 sequences. 

For applications in which the nucleic acid segments of the present invention are 
incorporated into vectors, such as plasmids, cosmids or viruses, these segments may be 
combined with other DNA sequences, such as promoters, polyadenylation signals, 
15 restriction enzyme sites, multiple cloning sites, other coding segments, and the like, such 
that their overall length may vary considerably. It is contemplated that a nucleic acid 
fragment of almost any length may be employed, with the total length preferably being 
limited by the ease of preparation and use in the intended recombinant DNA protocol. 

20 DNA segments encoding a specific gene may be introduced into recombinant host 

cells and employed for expressing a specific structural or regulatory protein. Alternatively, 
through the application of genetic engineering techniques, subportions or derivatives of 
selected genes may be employed. Upstream regions containing regulatory regions such as 
promoter regions may be isolated and subsequently employed for expression of the selected 

25 gene. 

In an alternative embodiment, the diapain-1 encoding nucleic acids employed may 
actually encode antisense constructs that hybridize, under intracellular conditions, to an 
diapain-1 encoding or other calpain encoding nucleic acid. The term "antisense 
30 construct" is intended to refer to nucleic acids, preferably oligonucleotides, that are 
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complementary to the base sequences of a target DNA or RNA. Antisense 
oligonucleotides, when introduced into a target cell, specifically bind to their target 
nucleic acid and interfere with transcription, RNA processing, transport, translation 
and/or stability. 

5 

Antisense constructs may be designed to bind to the promoter and other control 
regions, exons, introns or even exon-intron boundaries of a gene. Antisense RNA 
constructs, or DNA encoding such antisense RNAs, may be employed to inhibit gene 
transcription or translation or both within a host cell, either in vitro or in vivo, such as 

10 within a host animal, including a human subject. Nucleic acid sequences which comprise 
"complementary nucleotides" are those which are capable of base-pairing according to 
the standard Watson-Crick complementarity rules. That is, the larger purines will base 
pair with the smaller pyrimidines to form combinations of guanine paired with cytosine 
(G:C) and adenine paired with either thymine (A:T), in the case of DNA, or adenine 

15 paired with uracil (A:U) in the case of RNA. Inclusion of less common bases such as 
inosine, 5-methylcytosine, 6-methyladenine, hypoxanthine and others in hybridizing 
sequences does not interfere with pairing. 

As used herein, the terms "complementary" means nucleic acid sequences that are 
20 substantially complementary over their entire length and have very few base mismatches. 
For example, nucleic acid sequences of fifteen bases in length may be termed 
complementary when they have a complementary nucleotide at thirteen or fourteen 
positions with only a single mismatch. Naturally, nucleic acid sequences which are 
"completely complementary" will be nucleic acid sequences which are entirely 
25 complementary throughout their entire length and have no base mismatches. 

Other sequences with lower degrees of homology also are contemplated. For 
example, an antisense construct which has limited regions of high homology, but also 
contains a non-homologous region (e.g., a ribozyme) could be designed. These 
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molecules, though having less than 50% homology, would bind to target sequences under 
appropriate conditions. 

While all or part of the diapain-1 gene sequence may be employed in the context 
5 of antisense construction, short oligonucleotides are easier to make and increase in vivo 
accessibility. However, both binding affinity and sequence specificity of an antisense 
oligonucleotide to its complementary target increases with increasing length. It is 
contemplated that antisense oligonucleotides of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or more base pairs will be used. One can 
10 readily determine whether a given antisense nucleic acid is effective at targeting of the 
corresponding host cell gene simply by testing the constructs in vitro to determine 
whether the endogenous gene's function is affected or whether the expression of related 
genes having complementary sequences is affected. 

15 In certain embodiments, one may wish to employ antisense constructs which 

include other elements, for example, those which include C-5 propyne pyrimidines. 
Oligonucleotides which contain C-5 propyne analogues of uridine and cytidine have been 
shown to bind RNA with high affinity and to be potent antisense inhibitors of gene 
expression (Wagner etal, 1993). 

20 

Throughout this application, the term "expression construct" is meant to include 
any type of genetic construct containing a nucleic acid coding for a gene product in which 
part or all of the nucleic acid encoding sequence is capable of being transcribed. The 
transcript may be translated into a protein, but it need not be. Thus, in certain 
25 embodiments, expression includes both transcription of a gene and translation of a RNA 
into a gene product. In other embodiments, expression only includes transcription of the 
nucleic acid, for example, to generate antisense constructs. 

In preferred embodiments, the nucleic acid is under transcriptional control of a 
30 promoter. A "promoter" refers to a DNA sequence recognized by the synthetic 
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machinery of the cell, or introduced synthetic machinery, required to initiate the specific 
transcription of a gene. The phrase "under transcriptional control" means that the 
promoter is in the correct location and orientation in relation to the nucleic acid to control 
RNA polymerase initiation and expression of the gene. 

5 

The term promoter will be used here to refer to a group of transcriptional control 
modules that are clustered around the initiation site for RNA polymerase II. Much of the 
thinking about how promoters are organized derives from analyses of several viral 
promoters, including those for the HSV thymidine kinase (tk) and SV40 early 
10 transcription units. These studies, augmented by more recent work, have shown that 
promoters are composed of discrete functional modules, each consisting of approximately 
7-20 bp of DNA, and containing one or more recognition sites for transcriptional 
activator or repressor proteins. 

15 At least one module in each promoter functions to position the start site for RNA 

synthesis. The best known example of this is the TATA box, but in some promoters 
lacking a TATA box, such as the promoter for the mammalian terminal deoxynucleotidyl 
transferase gene and the promoter for the SV40 late genes, a discrete element overlying 
the start site itself helps to fix the place of initiation. 

20 

Additional promoter elements regulate the frequency of transcriptional initiation. 
Typically, these are located in the region 30-1 10 bp upstream of the start site, although a 
number of promoters have recently been shown to contain functional elements 
downstream of the start site as well. The spacing between promoter elements frequently 
25 is flexible, so that promoter function is preserved when elements are inverted or moved 
relative to one another. In the tk promoter, the spacing between promoter elements can 
be increased to 50 bp apart before activity begins to decline. Depending on the promoter, 
it appears that individual elements can function either co-operatively or independently to 
activate transcription. 

30 
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The particular promoter that is employed to control the expression of a nucleic 
acid is not believed to be critical, so long as it is capable of expressing the nucleic acid in 
the targeted cell. Thus, where a human cell is targeted, it is preferable to position the 
nucleic acid coding region adjacent to and under the control of a promoter that is capable 
5 of being expressed in a human cell. Generally speaking, such a promoter might include 
either a human or viral promoter. 

Preferred promoters include those derived from HSV, and calpain 10, 
additionally, other calpain promoters also may be useful. The sequence of the human, 
10 calpain 10 gene including promoter has also been identified by the present inventors and 
deposited in the GenBank database. Another preferred embodiment is the tetracycline 
controlled promoter. 

%B In various other embodiments, the human cytomegalovirus (CMV) immediate 

m 15 early gene promoter, the SV40 early promoter and the Rous sarcoma virus long terminal 

^ repeat can be used to obtain high-level expression of transgenes. The use of other viral or 

H mammalian cellular or bacterial phage promoters which are well-known in the art to 

„ " achieve expression of a transgene is contemplated as well, provided that the levels of 

expression are sufficient for a given purpose. Tables 1 and 2 list several 
f U 20 elements/promoters which may be employed, in the context of the present invention, to 
p regulate the expression of a transgene. This list is not intended to be exhaustive of all the 

^ possible elements involved in the promotion of transgene expression but, merely, to be 

exemplary thereof. 

25 Enhancers were originally detected as genetic elements that increased 

transcription from a promoter located at a distant position on the same molecule of DNA. 
This ability to act over a large distance had little precedent in classic studies of 
prokaryotic transcriptional regulation. Subsequent work showed that regions of DNA 
with enhancer activity are organized much like promoters. That is, they are composed of 

30 many individual elements, each of which binds to one or more transcriptional proteins. 
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The basic distinction between enhancers and promoters is operational. An 
enhancer region as a whole must be able to stimulate transcription at a distance; this need 
not be true of a promoter region or its component elements. On the other hand, a 
promoter must have one or more elements that direct initiation of RNA synthesis at a 
particular site and in a particular orientation, whereas enhancers lack these specificities. 
Promoters and enhancers are often overlapping and contiguous, often seeming to have a 
very similar modular organization. 

Additionally any promoter/enhancer combination (as per the Eukaryotic Promoter 
Data Base EPDB) could also be used to drive expression of a transgene. Use of a T3, T7 
or SP6 cytoplasmic expression system is another possible embodiment. Eukaryotic cells 
can support cytoplasmic transcription from certain bacterial promoters if the appropriate 
bacterial polymerase is provided, either as part of the delivery complex or as an additional 
genetic expression construct. 
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Table 2 



PROMOTER 



Immunoglobulin Heavy Chain 
Immunoglobulin Light Chain 
T-Cell Receptor 
HLA DQ a and DQ 6 
6-Interferon 
Interleukin-2 
Interleukin-2 Receptor 
MHC Class II 5 



MHC Class n HLA-DRa 
6-Actin 

Muscle Creatine Kinase 
Prealbumin (Transthyretin) 
Elastase / 
Metallothionein 



Collagenase 
Albumin Gene 



a-Fetoprotein 
a-Globin 
8-Globin 
c-fos 



c-HA-ras 
Insulin 

Neural Cell Adhesion Molecule (NCAM) 
ai-Anti-trypsin 

H2B (TH2B) Histone 

Mouse or Type I Collagen 
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PROMOTER 

Glucose-Regulated Proteins (GRP94 and GRP78) 



Rat Growth Hormone 



Human Serum Amyloid A (S AA) 

Troponin I (TN I) 

Platelet-Derived Growth Factor 



Duchenne Muscular Dystrophy 
SV40 



Polyoma 
Retroviruses 



Papilloma Virus 

Hepatitis B Virus 

Human Immunodeficiency Virus 

Cytomegalovirus 

Gibbon Ape Leukemia Virus 
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Table 3 





Element 


Inducer 




MTU 


Phorbol Ester (TPA) 
Heavy metals 




MMTV (mouse mammary tumor 

V 11 Uj ) 


Glucocorticoids 






nolvfrRX 
poly(rc) 




Adenovirus 5 E2 


Ela 




c-jun 


Phorbol Ester (TPA), H 2 O z 




Collagenase 


Phorbol Ester (TPA) 




Stromelysin 


Phorbol Ester (TPA), IL-1 




SV40 


Phorbol Ester (TPA) 


^ s 


Murine MX Gene 


Interferon, Newcastle Disease Virus 


~~~ 


GRP78 Gene 


A23187 


rr» 


a-2-Macroglobulin 


IL-6 




Vimentin 


Serum 




MHC Class I Gene H-2kB 


Interferon 




HSP70 


Ela, S V40 Large T Antigen 


?s ; 
: - = 

S-:L= 


Proliferin 


Phorbol Ester-TPA 


r ; 


Tumor Necrosis Factor 


FMA 


Thyroid Stimulating Hormone a 
Gene 


Thyroid Hormone 



5 Use of the baculovirus system will involve high level expression from the 

powerful polyhedron promoter. 

One will typically include a polyadenylation signal to effect proper 
polyadenylation of the transcript. The nature of the polyadenylation signal is not believed 
10 to be crucial to the successful practice of the invention, and any such sequence may be 
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employed. Preferred embodiments include the SV40 polyadenylation signal and the 
bovine growth hormone polyadenylation signal, convenient and known to function well in 
various target cells. Also contemplated as an element of the expression cassette is a 
terminator. These elements can serve to enhance message levels and to minimize read 
through from the cassette into other sequences. 

A specific initiation signal also may be required for efficient translation of coding 
sequences. These signals include the ATG initiation codon and adjacent sequences. 
Exogenous translational control signals, including the ATG initiation codon, may need to 
be provided. One of ordinary skill in the art would readily be capable of determining, this 
and providing the necessary signals. It is well known that the initiation codon must be 
"in-frame" with the reading frame of the desired coding sequence to ensure translation of 
the entire insert. The exogenous translational control signals and initiation codons can be 
either natural or synthetic. The efficiency of expression may be enhanced by the 
inclusion of appropriate transcription enhancer elements (Bittner et ai, 1987). 

In various embodiments of the invention, the expression construct may comprise a 
virus or engineered construct derived from a viral genome. The ability of certain viruses 
to enter cells via receptor-mediated endocytosis and to integrate into the host cell genome 
and express viral genes stably and efficiently have made them attractive candidates for the 
transfer of foreign genes into mammalian cells (Ridgeway, 1988; Nicolas and Rubenstein, 
1988; Baichwal and Sugden, 1986; Temin, 1986). The first viruses used as vectors were 
DNA viruses including the papovaviruses (simian virus 40, bovine papilloma virus, and 
polyoma) (Ridgeway, 1988; Baichwal and Sugden, 1986) and adenoviruses (Ridgeway, 
1988; Baichwal and Sugden, 1986) and adeno-associated viruses. Retroviruses also are 
attractive gene transfer vehicles (Nicolas and Rubenstein, 1988; Temin, 1986) as are 
vaccina virus (Ridgeway, 1988) and adeno-associated virus (Ridgeway, 1988). Such 
vectors may be used to (i) transform cell lines in vitro for the purpose of expressing 
proteins of interest or (ii) to transform cells in vitro or in vivo to provide therapeutic 
polypeptides in a gene therapy scenario. 
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In some embodiments, the vector is HSV. Because HSV is neurotropic, it has 
generated considerable interest in treating nervous system disorders. Since insulin- 
secreting pancreatic p-cells share many features with neurons, HSV may be useful for 
5 delivering genes to P-cells and for gene therapy of diabetes. Moreover, the ability of 
HSV to establish latent infections in non-dividing neuronal cells without integrating into 
the host cell chromosome or otherwise altering the host cell's metabolism, along with the 
existence of a promoter that is active during latency. And though much attention has 
focused on the neurotropic applications of HSV, this vector also can be exploited for 
10 other tissues. 

Another factor that makes HSV an attractive vector is the size and organization of 
the genome. Because HSV is large, incorporation of multiple genes or expression 
cassettes is less problematic than in other smaller viral systems. In addition, the 
15 availability of different viral control sequences with varying performance (temporal, 
strength, etc.) makes it possible to control expression to a greater extent than in other 
systems. It also is an advantage that the virus has relatively few spliced messages, further 
easing genetic manipulations. 

20 HSV also is relatively easy to manipulate and can be grown to high titers. Thus, 

delivery is less of a problem, both in terms of volumes needed to attain sufficient MOI 
and in a lessened need for repeat dosings. 

E. Encoded Proteins 

25 

Once the entire coding sequence of a particular gene has been determined, the gene 
can be inserted into an appropriate expression system. In this case, the inventors have 
identified diapain-1 as a type 2 diabetes susceptibility gene. The gene can be expressed in 
any number of different recombinant DNA expression systems to generate large amounts of 
30 the polypeptide product, which can then be purified and used to vaccinate animals to 
generate antisera with which further studies may be conducted. 

51 

A: 230957(4 Y7H0 1 !.DOQ 



Examples of expression systems known to the skilled practitioner in the art include 
bacteria such as E. coli, yeast such as Saccharomyces cerevisia and Pichia pastoris, 
baculovirus, and mammalian expression systems such as in COS or CHO cells. In one 
5 embodiment, polypeptides are expressed in E. coli and in baculovirus expression systems. 
A complete gene can be expressed or, alternatively, fragments of the gene encoding portions 
of polypeptide can be produced. 

In one embodiment, the gene sequence encoding the polypeptide is analyzed to 
10 detect putative transmembrane sequences. Such sequences are typically very hydrophobic 
and are readily detected by the use of standard sequence analysis software, such as DNA 
Star (DNA Star, Madison, WT). The presence of transmembrane sequences is often 
deleterious when a recombinant protein is synthesized in many expression systems, 
especially E. coli, as it leads to the production of insoluble aggregates that are difficult to 
15 renature into the native conformation of the protein. Deletion of transmembrane sequences 
typically does not significantly alter the conformation of the remaining protein structure. 

Moreover, transmembrane sequences, being by definition embedded within a 
membrane, are inaccessible. Therefore, antibodies to these sequences will not prove useful 

20 for in vivo or in situ studies. Deletion of transmembrane-encoding sequences from the 
genes used for expression can be achieved by standard techniques. For example, 
fortuitously-placed restriction enzyme sites can be used to excise the desired gene fragment, 
or PCR-type amplification can be used to amplify only the desired part of the gene. The 
skilled practitioner will realize that such changes must be designed so as not to change the 

25 translational reading frame for downstream portions of the protein-encoding sequence. 

In one embodiment, computer sequence analysis is used to determine the location of 
the predicted major antigenic determinant epitopes of the polypeptide. Software capable of 
carrying out this analysis is readily available commercially, for example DNA Star (DNA 
30 Star, Madison, WT). The software typically uses standard algorithms such as the 

52 

A: 230957<4Y7H01!.DOO 



Kyte/Doolittle or Hopp/Woods methods for locating hydrophilic sequences which are 
characteristically found on the surface of proteins and are, therefore, likely to act as 
antigenic determinants. 



5 Once this analysis is made, polypeptides can be prepared that contain at least the 

essential features of the antigenic determinant and that can be employed in the generation of 
antisera against the polypeptide. Minigenes or gene fusions encoding these determinants 
can be constructed and inserted into expression vectors by standard methods, for example, 
using PCR methodology. 

10 

The gene or gene fragment encoding a polypeptide can be inserted into an 
expression vector by standard subcloning techniques. In one embodiment, an £. coli 
?s% expression vector is used that produces the recombinant polypeptide as a fusion protein, 

\Q allowing rapid affinity purification of the protein. Examples of such fusion protein 

fri 15 expression systems are the glutathione S-transferase system (Pharmacia, Piscataway, NJ), 
;S the maltose binding protein system (New England Biolabs, Beverley, MA), the FLAG 

N system (IBI, New Haven, CT), and the 6xHis system (Qiagen, Chatsworth, CA). 

12 Some of these systems produce recombinant polypeptides bearing only a small 

fU 20 number of additional amino acids, which are unlikely to affect the antigenic ability of the 
p recombinant polypeptide. For example, both the FLAG system and the 6xHis system add 

r " only short sequences, both of that are known to be poorly antigenic and which do not 

adversely affect folding of the polypeptide to its native conformation. Other fusion systems 
produce polypeptide where it is desirable to excise the fusion partner from the desired 
25 polypeptide. In one embodiment, the fusion partner is linked to the recombinant 
polypeptide by a peptide sequence containing a specific recognition sequence for a protease. 
Examples of suitable sequences are those recognized by the Tobacco Etch Virus protease 
(Life Technologies, Gaithersburg, MD) or Factor Xa (New England Biolabs, Beverley, 
MA). 

30 
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Recombinant bacterial cells, for example E. coli, are grown in any of a number of 
suitable media, for example LB, and the expression of the recombinant polypeptide induced 
by adding BPTG to the media or switching incubation to a higher temperature. After 
culturing the bacteria for a further period of between 2 and 24 hours, the cells are collected 
5 by centrifugation and washed to remove residual media. The bacterial cells are then lysed, 
for example, by disruption in a cell homogenizer and centrifuged to separate the dense 
inclusion bodies and cell membranes from the soluble cell components. This centrifugation 
can be performed under conditions whereby the dense inclusion bodies, are selectively 
enriched by incorporation of sugars such as sucrose into the buffer and centrifugation at a 
10 selective speed. 

In another embodiment, the expression system used is one driven by the baculovirus 
polyhedron promoter. The gene encoding the polypeptide can be manipulated by standard 
techniques in order to facilitate cloning into the baculovirus vector. One baculovirus vector 
15 is the pBlueBac vector (Invitrogen, Sorrento, CA). The vector carrying the gene for the 
polypeptide is transfected into Spodoptera frugiperda (Sf9) cells by standard protocols, and 
the cells are cultured and processed to produce the recombinant antigen. See Summers et 
aU A MANUAL OF METHODS FOR BACULOVIRUS VECTORS AND INSECT CELL 
CULTURE PROCEDURES, Texas Agricultural Experimental Station. 

20 

As an alternative to recombinant polypeptides, synthetic peptides corresponding to 
the antigenic determinants can be prepared. Such peptides are at least six amino acid 
residues long, and may contain up to approximately 35 residues, which is the approximate 
upper length limit of automated peptide synthesis machines, such as those available from 
25 Applied Biosystems (Foster City, CA). Use of such small peptides for vaccination typically 
requires conjugation of the peptide to an immunogenic carrier protein such as hepatitis B 
surface antigen, keyhole limpet hemocyanin or bovine serum albumin. Methods for 
performing this conjugation are well known in the art. 
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In one embodiment, amino acid sequence variants of the polypeptide can be 
prepared. These may, for instance, be minor sequence variants of the polypeptide that arise 
due to natural variation within the population or they may be hdmologues found in other 
species. They also may be sequences that do not occur naturally but that are sufficiently 
similar that they function similarly and/or elicit an immune response that cross-reacts with 
natural forms of the polypeptide. Sequence variants can be prepared by standard methods 
of site-directed mutagenesis such as those described below in the following section. 

Amino acid sequence variants of the polypeptide can be substitutional, insertional or 
deletion variants. Deletion variants lack one or more residues of the native protein which 
are not essential for function or immunogenic activity, and are exemplified by the variants 
lacking a transmembrane sequence described above. Another common type of deletion 
variant is one lacking secretory signal sequences or signal sequences directing a protein to 
bind to a particular part of a cell. An example of the latter sequence is the SH2 domain, 
which induces protein binding to phosphotyrosine residues. 

Substitutional variants typically contain the exchange of one amino acid for another 
at one or more sites within the protein, and may be designed to modulate one or more 
properties of the polypeptide such as stability against proteolytic cleavage. Substitutions 
preferably are conservative, that is, one amino acid is replaced with one of similar shape and 
charge. Conservative substitutions are well known in the art and include, for example, the 
changes of: alanine to serine; arginine to lysine; asparagine to glutamine or histidine; 
aspartate to glutamate; cysteine to serine; glutamine to asparagine; glutamate to aspartate; 
glycine to proline; histidine to asparagine or glutamine; isoleucine to leucine or valine; 
leucine to valine or isoleucine; lysine to arginine; methionine to leucine or isoleucine; 
phenylalanine to tyrosine, leucine or methionine; serine to threonine; threonine to serine; 
tryptophan to tyrosine; tyrosine to tryptophan or phenylalanine; and valine to isoleucine or 
leucine. 
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Insertional variants include fusion proteins such as those used to allow rapid 
purification of the polypeptide and also can include hybrid proteins containing sequences 
from other proteins and polypeptides which are homologues of the polypeptide. For 
example, an insertional variant could include portions of the amino acid sequence of the 
5 polypeptide from one species, together with portions of the homologous polypeptide from 
another species. Other insertional variants can include those in which additional amino 
acids are introduced within the coding sequence of the polypeptide. These typically are 
smaller insertions than the fusion proteins described above and are introduced, for example, 
into a protease cleavage site. 

10 

In one embodiment, major antigenic determinants of the polypeptide are identified 
by an empirical approach in which portions of the gene encoding the polypeptide are 
expressed in a recombinant host, and the resulting proteins tested for their ability to elicit an 
immune response. For example, PCR™ can be used to prepare a range of cDNAs encoding 
15 peptides lacking successively longer fragments of the C-terminus of the protein. The 
immunoprotective activity of each of these peptides then identifies those fragments or 
domains of the polypeptide that are essential for this activity. Further experiments in which 
only a small number of amino acids are removed at each iteration then allows the location 
of the antigenic determinants of the polypeptide. 

20 

Another embodiment for the preparation of the polypeptides according to the 
invention is the use of peptide mimetics. Mimetics are peptide-containing molecules that 
mimic elements of protein secondary structure. See, for example, Johnson et aL 9 "Peptide 
Turn Mimetics" in BIOTECHNOLOGY AND PHARMACY, Pezzuto et al y Eds., Chapman 
25 and Hall, New York (1993). The underlying rationale behind the use of peptide mimetics is 
that the peptide backbone of proteins exists chiefly to orient amino acid side chains in such 
a way as to facilitate molecular interactions, such as those of antibody and antigen. A 
peptide mimetic is expected to permit molecular interactions similar to the natural 
molecule. 

30 
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Successful applications of the peptide mimetic concept have thus far focused on 
mimetics of p-turns within proteins, which are known to be highly antigenic. Likely p-turn 
structure within an polypeptide can be predicted by computer-based algorithms as discussed 
above. Once the component amino acids of the turn are determined, peptide mimetics can 
5 be constructed to achieve a similar spatial orientation of the essential elements of the amino 
acid side chains. 

Modification and changes may be made in the structure of a gene and still obtain a 
functional molecule that encodes a protein or polypeptide with desirable characteristics. The 
10 following is a discussion based upon changing the amino acids of a protein to create an 
equivalent, or even an improved, second-generation molecule. The amino acid changes may 
be achieved by changing the codons of the DNA sequence, according to the following data. 

For example, certain amino acids may be substituted for other amino acids in a 
15 protein structure without appreciable loss of interactive binding capacity with structures 
such as, for example, antigen-binding regions of antibodies or binding sites on substrate 
molecules. Since it is the interactive capacity and nature of a protein that defines that 
protein's biological functional activity, certain amino acid substitutions can be made in a 
protein sequence, and its underlying DNA coding sequence, and nevertheless obtain a 
20 protein with like properties. It is thus contemplated by the inventors that various changes 
may be made in the DNA sequences of genes without appreciable loss of their biological 
utility or activity. 

In making such changes, the hydropathic index of amino acids may be considered. 
25 The importance of the hydropathic amino acid index in conferring interactive biologic 
function on a protein is generally understood in the art (Kyte & Doolittle, 1982). 
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Table 4 



Amino acios 








Alanine 


Ala 
/Via 


A 


GTA GTr GTG GCU 


cysteine 




c 


TTGG TIGTI 


Aspartic acid 


Asp 


u 


HAf GATI 


Glutamic acid 


oiu 




flA A r T AfT 


Phenylalanine 


pup 

jrne 


F 


T TT TP TTTTTT 


Glycine 




ri 

KJ 


nr T A nnr 1 onG ggti 


Histidine 


rllS 


LJ 

n 


rAP PATT 


Isoleucine 


ne 


T 

l 


ATTA ATTP ATTTT 

/\U/\ AU^ aUU 


Lysine 


Lys 


iv 


AAA A Af; 


Leucine 


Leu 


T 


TTTTA TTTTr; PTTA CMC Cl]d PTTTT 
UUA UUU l^U/\ LUv V^Uvj v^U U 


Methionine 


Met 


JVl 


Ana 


Asparagine 


Asn 


XT 

IN 


A AC A ATT 
AAL AAU 


Proline 


rro 


r> 

r 


CCA ccc cca PPTT 
d^A l^l^vj v_A^U 


VJl U loll 111 1C 


Gin 


o 


CAA CAG 


Arginine 


Arg 


R 


AGA AGG CGA CGC CGG CGU 


Serine 


Ser 


S 


AGCAGU UCA UCC UCG UCU 


Threonine 


Thr 


T 


ACA ACC ACG ACU 


Valine 


Val 


V 


GUA GUC GUG GUU 


Tryptophan 


Trp 


w 


UGG 


Tyrosine 


Tyr 


Y 


UAC UAU 



It is accepted that the relative hydropathic character of the amino acid contributes 
5 to the secondary structure of the resultant protein, which in turn defines the interaction of 
the protein with other molecules, for example, enzymes, substrates, receptors, DNA, 
antibodies, antigens, and the like. 

Each amino acid has been assigned a hydropathic index on the basis of their 
10 hydrophobicity and charge characteristics (Kyte & Doolittle, 1982), these are: Isoleucine 
(44.5); valine (+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); 
methionine (+1.9); alanine (+1.8); glycine (-0.4); threonine (-0.7); serine (-0.8); 
tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); histidine (-3.2); glutamate (-3.5); 
glutamine (-3.5); aspartate (-3.5); asparagine (-3.5); lysine (-3.9); and arginine (-4.5). 
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It is known in the art that certain amino acids may be substituted by other amino 
acids having a similar hydropathic index or score and still result in a protein with similar 
biological activity, i.e., still obtain a biological functionally equivalent protein. In making 
such changes, the substitution of amino acids whose hydropathic indices are within ±2 is 
5 preferred, those which are within ±1 are particularly preferred, and those within ±0.5 are 
even more particularly preferred. 

It is also understood in the art that the substitution of like amino acids can be 
made effectively on the basis of hydrophilicity. U.S. Patent 4,554,101, incorporated 
10 herein by reference, states that the greatest local average hydrophilicity of a protein, as 
governed by the hydrophilicity of its adjacent amino acids, correlates with a biological 
property of the protein. 

0 As detailed in U.S. Patent 4,554,101, the following hydrophilicity values have 

f.ri 15 been assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0 ± 1); 
J| glutamate (+3.0 ± 1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); 

N threonine (-0.4); proline (-0.5 ± 1); alanine (-0.5); histidine -0.5); cysteine (-1.0); 

Sj 

. methionine (-1.3); valine (-1.5); leucine (-1.8); isoleucine (-1.8); tyrosine (-2.3); 

T2 phenylalanine (-2.5); tryptophan (-3.4). 

fij 20 

?5 It is understood that an amino acid can be substituted for another having a similar 

?== hydrophilicity value and still obtain a biologically equivalent and immunologically 

equivalent protein. In such changes, the substitution of amino acids whose hydrophilicity 
values are within ±2 is preferred, those that are within ±1 are particularly preferred, and 
25 those within ±0.5 are even more particularly preferred. 

As outlined above, amino acid substitutions are generally based on the relative 
similarity of the amino acid side-chain substituents, for example, their hydrophobicity, 
hydrophilicity, charge, size, and the like. Exemplary substitutions that take various of the 
30 foregoing characteristics into consideration are well known to those of skill in the art and 
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include: arginine and lysine; glutamate and aspartate; serine and threonine; glutamine and 
asparagine; and valine, leucine and isoleucine. 



F. Site-Specific Mutagenesis 

5 Site-specific mutagenesis is a technique useful in the preparation of individual 

peptides, or biologically functional equivalent proteins or peptides, through specific 
mutagenesis of the underlying DNA. The technique further provides a ready ability to 
prepare and test sequence variants, incorporating one or more of the foregoing 
considerations, by introducing one or more nucleotide sequence changes into the DNA. 

10 Site-specific mutagenesis allows the production of mutants through the use of specific 
oligonucleotide sequences which encode the DNA sequence of the desired mutation, as 
well as a sufficient number of adjacent nucleotides, to provide a primer sequence of 
sufficient size and sequence complexity to form a stable duplex on both sides of the 
deletion junction being traversed. Typically, a primer of about 17 to 25 nucleotides in 

15 length is preferred, with about 5 to 10 residues on both sides of the junction of the 
sequence being altered. 

In general, the technique of site-specific mutagenesis is well known in the art. As 
will be appreciated, the technique typically employs a bacteriophage vector that exists in 
20 both a single stranded and double stranded form. Typical vectors useful in site-directed 
mutagenesis include vectors such as the Ml 3 phage. These phage vectors are 
commercially available and their use is generally well known to those skilled in the art. 
Double stranded plasmids are also routinely employed in site directed mutagenesis, which 
eliminates the step of transferring the gene of interest from a phage to a plasmid. 

25 

In general, site-directed mutagenesis is performed by first obtaining a single- 
stranded vector, or melting of two strands of a double stranded vector which includes 
within its sequence a DNA sequence encoding the desired protein. An oligonucleotide 
primer bearing the desired mutated sequence is synthetically prepared. This primer is 
30 then annealed with the single-stranded DNA preparation, and subjected to DNA 
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polymerizing enzymes such as E. coli polymerase I Klenow fragment, in order to 
complete the synthesis of the mutation-bearing strand. Thus, a heteroduplex is formed 
wherein one strand encodes the original non-mutated sequence and the second strand 
bears the desired mutation. This heteroduplex vector is then used to transform 
5 appropriate cells, such as E. coli cells, and clones are selected that include recombinant 
vectors bearing the mutated sequence arrangement. 

The preparation of sequence variants of the selected gene using site-directed 
mutagenesis is provided as a means of producing potentially useful species and is not 
10 meant to be limiting, as there are other ways in which sequence variants of genes may be 
obtained. For example, recombinant vectors encoding the desired gene may be treated 
with mutagenic agents, such as hydroxylamine, to obtain sequence variants. 

G. Expression and Purification of Encoded Proteins 

15 i. Expression of Proteins from Cloned cDNAs 

The cDNA species specified in SEQ ID NO: 1, SEQ ID NO:3, SEQ ID NO:5, SEQ 
ID NO:7, SEQ ID NO:9, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID 
NO: 17, SEQ ID NO: 19, and SEQ ID NO:21 can be expressed as encoded peptides or 
proteins. The engineering of DNA segment(s) for expression in a prokaryotic or 

20 eukaryotic system may be performed by techniques generally known to those of skill in 
recombinant expression. It is believed that virtually any expression system may be 
employed in the expression of the claimed nucleic acid sequences. 

Both cDNA and genomic sequences are suitable for eukaryotic expression, as the 
25 host cell will generally process the genomic transcripts to yield functional mRNA for 
translation into protein. Generally speaking, it may be more convenient to employ as the 
recombinant gene a cDNA version of the gene. It is believed that the use of a cDNA 
version will provide advantages in that the size of the gene will generally be much 
smaller and more readily employed to transfect the targeted cell than will a genomic gene, 
30 which will typically be up to an order of magnitude larger than the cDNA gene. 
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However, the inventor does not exclude the possibility of employing a genomic version of 
a particular gene where desired. 

As used herein, the terms "engineered" and "recombinant" cells are intended to 
5 refer to a cell into which an exogenous DNA segment or gene, such as a cDNA or gene 
has been introduced. Therefore, engineered cells are distinguishable from naturally 
occurring cells which do not contain a recombinantly introduced exogenous DNA 
segment or gene. Engineered cells are thus cells having a gene or genes introduced 
through the hand of man. Recombinant cells include those having an introduced cDNA 
10 or genomic DNA, and also include genes positioned adjacent to a promoter not naturally 
associated with the particular introduced gene. 

To express a recombinant encoded protein or peptide, whether mutant or wild- 
type, in accordance with the present invention one would prepare an expression vector 

15 that comprises one of the claimed isolated nucleic acids under the control of one or more 
promoters. To bring a coding sequence "under the control of a promoter, one positions 
the 5* end of the translational initiation site of the reading frame generally between about 
1 and 50 nucleotides "downstream" of (i\e., 3' of) the chosen promoter. The "upstream" 
promoter stimulates transcription of the inserted DNA and promotes expression of the 

20 encoded recombinant protein. This is the meaning of "recombinant expression" in the 
context used here. 

Many standard techniques are available to construct expression vectors containing 
the appropriate nucleic acids and transcriptional/translational control sequences in order 
25 to achieve protein or peptide expression in a variety of host-expression systems. Cell 
types available for expression include, but are not limited to, bacteria, such as E. coli and 
B. subtilis transformed with recombinant phage DNA, plasmid DNA or cosmid DNA 
expression vectors. 
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Certain examples of prokaryotic hosts are E. coli strain RR1, E. coli LE392, 
E. coliB, E. coli % 1776 (ATCC No. 31537) as well as E. coli W3110 (F-, lambda-, 
prototrophic, ATCC No. 273325); bacilli such as Bacillus subtilis\ and other 
enterobacteriaceae such as Salmonella typhimurium, Serratia marcescens, and various 
5 Pseudomonas species. 

In general, plasmid vectors containing replicon and control sequences that are 
derived from species compatible with the host cell are used in connection with these 
hosts. The vector ordinarily carries a replication site, as well as marking sequences that 

10 are capable of providing phenotypic selection in transformed cells. For example, E. coli 
is often transformed using pBR322, a plasmid derived from an coli species. Plasmid 
pBR322 contains genes for ampicillin and tetracycline resistance and thus provides easy 
means for identifying transformed cells. The pBR322 plasmid, or other microbial 
plasmid or phage must also contain, or be modified to contain, promoters that can be used 

15 by the microbial organism for expression of its own proteins. 

In addition, phage vectors containing replicon and control sequences that are 
compatible with the host microorganism can be used as transforming vectors in 
connection with these hosts. For example, the phage lambda GEM™-1 1 may be utilized 
20 in making a recombinant phage vector that can be used to transform host cells, such as E. 
coli LE392. 

Further useful vectors include pIN vectors (Inouye et al 9 1985); and pGEX 
vectors, for use in generating glutathione 5-transferase (GST) soluble fusion proteins for 
25 later purification and separation or cleavage. Other suitable fusion proteins are those with 
8-galactosidase, ubiquitin, or the like. 

Promoters that are most commonly used in recombinant DNA construction 
include the p-lactamase (penicillinase), lactose and tryptophan (trp) promoter systems. 
30 While these are the most commonly used, other microbial promoters have been 
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discovered and utilized, and details concerning their nucleotide sequences have been 
published, enabling those of skill in the art to ligate them functionally with plasmid 
vectors. 

For expression in Saccharomyces, the plasmid YRp7, for example, is commonly 
used (Stinchcomb et aU 1979; Kingsman et al, 1979; Tschemper et aL 9 1980). This 
plasmid contains the trpl gene, which provides a selection marker for a mutant strain of 
yeast lacking the ability to grow in tryptophan, for example ATCC No. 44076 or PEP4-1 
(Jones, 1977). The presence of the trpl lesion as a characteristic of the yeast host cell 
genome then provides an effective environment for detecting transformation by growth in 
the absence of tryptophan. 

Suitable promoting sequences in yeast vectors include the promoters for 
3-phosphoglycerate kinase (Hitzeman et a/., 1980) or other glycolytic enzymes (Hess et 
al, 1968; Holland et al, 1978), such as enolase, glyceraldehyde-3-phosphate 
dehydrogenase, hexokinase, pyruvate decarboxylase, phosphofructokinase, glucose-6- 
phosphate isomerase, 3-phosphoglycerate mutase, pyruvate kinase, triosephosphate 
isomerase, phosphoglucose isomerase, and glucokinase. In constructing suitable 
expression plasmids, the termination sequences associated with these genes are also 
ligated into the expression vector 3' of the sequence desired to be expressed to provide 
polyadenylation of the mRNA and termination. 

Other suitable promoters, which have the additional advantage of transcription 
controlled by growth conditions, include the promoter region for alcohol dehydrogenase 
2, isocytochrome C, acid phosphatase, degradative enzymes associated with nitrogen 
metabolism, and the aforementioned glyceraldehyde-3-phosphate dehydrogenase, and 
enzymes responsible for maltose and galactose utilization. 

In addition to micro-organisms, cultures of cells derived from multicellular 
organisms may also be used as hosts. In principle, any such cell culture is workable, 
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whether from vertebrate or invertebrate culture. In addition to mammalian cells, these 
include insect cell systems infected with recombinant virus expression vectors (e.g., 
baculovirus); and plant cell systems infected with recombinant virus expression vectors 
(e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or transformed with 
5 recombinant plasmid expression vectors (e.g., Ti plasmid) containing one or more coding 
sequences. 

In a useful insect system, Autograph californica nuclear polyhidrosis virus 
(AcNPV) is used as a vector to express foreign genes. The virus grows in Spodoptera 

10 frugiperda cells. The isolated nucleic acid coding sequences are cloned into non- 
essential regions (for example the polyhedron gene) of the virus and placed under control 
of an AcNPV promoter (for example, the polyhedron promoter). Successful insertion of 
the coding sequences results in the inactivation of the polyhedron gene and production of 
non-occluded recombinant virus (i.e., virus lacking the proteinaceous coat coded for by 

15 the polyhedron gene). These recombinant viruses are then used to infect Spodoptera 
frugiperda cells in which the inserted gene is expressed (e.g., U.S. Patent No. 4,215,051). 

Examples of useful mammalian host cell lines are VERO and HeLa cells, Chinese 
hamster ovary (CHO) cell lines, WI38, BHK, COS-7, 293, HepG2, NIH3T3, RIN and 
20 MDCK cell lines. In addition, a host cell may be chosen that modulates the expression of 
the inserted sequences, or modifies and processes the gene product in the specific fashion 
desired. Such modifications (e.g., glycosylation) and processing (e.g., cleavage) of 
protein products may be important for the function of the encoded protein. 

25 Different host cells have characteristic and specific mechanisms for the post- 

radiational processing and modification of proteins. Appropriate cell lines or host 
systems can be chosen to ensure the correct modification and processing of the foreign 
protein expressed. Expression vectors for use in mammalian cells ordinarily include an 
origin of replication (as necessary), a promoter located in front of the gene to be 

30 expressed, along with any necessary ribosome binding sites, RNA splice sites, 
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polyadenylation site, and transcriptional terminator sequences. The origin of replication 
may be provided either by construction of the vector to include an exogenous origin, such 
as may be derived from SV40 or other viral (e.g., Polyoma, Adeno, VSV, BPV) source, 
or may be provided by the host cell chromosomal replication mechanism. If the vector is 
5 integrated into the host cell chromosome, the latter is often sufficient. 

The promoters may be derived from the genome of mammalian cells (e.g., 
metallothionein promoter) or from mammalian viruses (e.g., the adenovirus late 
promoter; the vaccinia virus 7.5K promoter). Further, it is also possible, and may be 
10 desirable, to utilize promoter or control sequences normally associated with the desired 
gene sequence, provided such control sequences are compatible with the host cell 
systems. 

A number of viral based expression systems may be utilized, for example, 
15 commonly used promoters are derived from polyoma, Adenovirus 2, cytomegalovirus and 
Simian Virus 40 (SV40). The early and late promoters of SV40 virus are useful because 
both are obtained easily from the virus as a fragment which also contains the SV40 viral 
origin of replication. Smaller or larger SV40 fragments may also be used, provided there 
is included the approximately 250 bp sequence extending from the HihDUl site toward 
20 the BgK site located in the viral origin of replication. 

In cases where an adenovirus is used as an expression vector, the coding 
sequences may be ligated to an adenovirus transcription/translation control complex, e.g., 
the late promoter and tripartite leader sequence. This chimeric gene may then be inserted 
25 in the adenovirus genome by in vitro or in vivo recombination. Insertion in a non- 
essential region of the viral genome (e.g., region El or E3) will result in a recombinant 
virus that is viable and capable of expressing proteins in infected hosts. 

Specific initiation signals may also be required for efficient translation of the 
30 claimed isolated nucleic acid coding sequences. These signals include the ATG initiation 
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codon and adjacent sequences. Exogenous translational control signals, including the 
ATG initiation codon, may additionally need to be provided. One of ordinary skill in the 
art would readily be capable of determining this need and providing the necessary signals. 
It is well known that the initiation codon must be in-frame (or in-phase) with the reading 
frame of the desired coding sequence to ensure translation of the entire insert. These 
exogenous translational control signals and initiation codons can be of a variety of 
origins, both natural and synthetic. The efficiency of expression may be enhanced by the 
inclusion of appropriate transcription enhancer elements or transcription terminators 
(Bittneref a/., 1987). 

In eukaryotic expression, one will also typically desire to incorporate into the 
transcriptional unit an appropriate polyadenylation site (e.g., 5-AATAAA-3 1 , SEQ ID 
NO:30) if one was not contained within the original cloned segment. Typically, the poly 
A addition site is placed about 30 to 2000 nucleotides "downstream" of the termination 
site of the protein at a position prior to transcription termination. 

For long-term, high-yield production of recombinant proteins, stable expression is 
preferred. For example, cell lines that stably express constructs encoding proteins may be 
engineered. Rather than using expression vectors that contain viral origins of replication, 
host cells can be transformed with vectors controlled by appropriate expression control 
elements (e.g., promoter, enhancer, sequences, transcription terminators, polyadenylation 
sites, etc.), and a selectable marker. Following the introduction of foreign DNA, 
engineered cells may be allowed to grow for 1-2 days in an enriched medium, and then 
are switched to a selective medium. The selectable marker in the recombinant plasmid 
confers resistance to the selection and allows cells to stably integrate the plasmid into 
their chromosomes and grow to form foci, which in turn can be cloned and expanded into 
cell lines. 

A number of selection systems may be used, including, but not limited, to the 
herpes simplex virus thymidine kinase (Wigler et al, 1977), hypoxanthine-guanine 
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phosphoribosyltransferase (Szybalska et aL, 1962) and adenine phosphoribosyltransferase 
genes (Lowy et aL, 1980), in tk~, hgprf or aprt cells, respectively. Also, antimetabolite 
resistance can be used as the basis of selection for dhfr, which confers resistance to 
methotrexate (Wigler et aL, 1980; O'Hare et aL, 1981); gpt, which confers resistance to 
mycophenolic acid (Mulligan et aL, 1981); neo, which confers resistance to the 
aminoglycoside G-418 (Colberre-Garapin et aL, 1981); and hygro, which confers 
resistance to hygromycin. 

It is contemplated that the isolated nucleic acids of the invention may be 
"overexpressed", i.e., expressed in increased levels relative to its natural expression in 
human cells, or even relative to the expression of other proteins in the recombinant host 
cell. Such overexpression may be assessed by a variety of methods, including radio- 
labeling and/or protein purification. However, simple and direct methods are preferred, 
for example, those involving SDS/PAGE and protein staining or western blotting, 
followed by quantitative analyses, such as densitometric scanning of the resultant gel or 
blot. A specific increase in the level of the recombinant protein or peptide in comparison 
to the level in natural human cells is indicative of overexpression, as is a relative 
abundance of the specific protein in relation to the other proteins produced by the host 
cell and, e.g., visible on a gel. 

2. Purification of Expressed Proteins 

Further aspects of the present invention concern the purification, and in particular 
embodiments, the substantial purification, of an encoded protein or peptide. The term 
"purified protein or peptide " as used herein, is intended to refer to a composition, 
isolatable from other components, wherein the protein or peptide is purified to any degree 
relative to its naturally-obtainable state, i.e., in this case, relative to its purity within a 
hepatocyte or p-cell extract. A purified protein or peptide therefore also refers to a 
protein or peptide, free from the environment in which it may naturally occur. 
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Generally, "purified" will refer to a protein or peptide composition that has been 
subjected to fractionation to remove various other components, and which composition 
substantially retains its expressed biological activity. Where the term "substantially 
purified" is used, this designation will refer to a composition in which the protein or 
5 peptide forms the major component of the composition, such as constituting about 50% 
or more of the proteins in the composition. 

Various methods for quantifying the degree of purification of the protein or 
peptide will be known to those of skill in the art in light of the present disclosure. These 

10 include, for example, determining the specific activity of an active fraction, or assessing 
the number of polypeptides within a fraction by SDS/PAGE analysis. A preferred 
method for assessing the purity of a fraction is to calculate the specific activity of the 
fraction, to compare it to the specific activity of the initial extract, and to thus calculate 
the degree of purity, herein assessed by a "-fold purification number". The actual units 

15 used to represent the amount of activity will, of course, be dependent upon the particular 
assay technique chosen to follow the purification and whether or not the expressed 
protein or peptide exhibits a detectable activity. 

Various techniques suitable for use in protein purification will be well known to 
20 those of skill in the art. These include, for example, precipitation with ammonium 
sulphate, polyethylene glycol, antibodies and the like or by heat denaturation, followed by 
centrifiigation; chromatography steps such as ion exchange, gel filtration, reverse phase, 
hydroxylapatite and affinity chromatography; isoelectric focusing; gel electrophoresis; 
and combinations of such and other techniques. As is generally known in the art, it is 
25 believed that the order of conducting the various purification steps may be changed, or 
that certain steps may be omitted, and still result in a suitable method for the preparation 
of a substantially purified protein or peptide. 

There is no general requirement that the protein or peptide always be provided in 
30 their most purified state. Indeed, it is contemplated that less substantially purified 
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products will have utility in certain embodiments. Partial purification may be 
accomplished by using fewer purification steps in combination, or by utilizing different 
forms of the same general purification scheme. For example, it is appreciated that a 
cation-exchange column chromatography performed utilizing an HPLC apparatus will 
5 generally result in a greater -fold purification than the same technique utilizing a low 
pressure chromatography system. Methods exhibiting a lower degree of relative 
purification may have advantages in total recovery of protein product, or in maintaining 
the activity of an expressed protein. 

10 It is known that the migration of a polypeptide can vary, sometimes significantly, 

with different conditions of SDS/PAGE (Capaldi et aL, Biochem. Biophys. Res. Comm., 
76:425, 1977). It will therefore be appreciated that under differing electrophoresis 
conditions, the apparent molecular weights of purified or partially purified expression 
products may vary. 

15 

H. Preparation of Antibodies Specific for Encoded Proteins 

Antibody Generation 

For some embodiments, it will be desired to produce antibodies that bind with 
20 high specificity to the protein product(s) of an isolated nucleic acid selected from the 
group comprising the sequences in SEQ ID NO:l, or any mutant of calpain 10. Means for 
preparing and characterizing antibodies are well known in the art (See, e.g., Antibodies: 
A Laboratory Manual, Cold Spring Harbor Laboratory, 1988, incorporated herein by 
reference). 

25 

Methods for generating polyclonal antibodies are well known in the art. Briefly, a 
polyclonal antibody is prepared by immunizing an animal with an antigenic composition 
and collecting antisera from that immunized animal. A wide range of animal species can 
be used for the production of antisera. Typically the animal used for production of 
30 antisera is a rabbit, a mouse, a rat, a hamster, a guinea pig or a goat. Because of the 
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relatively large blood volume of rabbits, a rabbit is a preferred choice for production of 
polyclonal antibodies. 

As is well known in the art, a given composition may vary in its immunogenicity. 
5 It is often necessary therefore to boost the host immune system, as may be achieved by 
coupling a peptide or polypeptide immunogen to a carrier. Exemplary and preferred 
carriers are keyhole limpet hemocyanin (KLH) and bovine serum albumin (BSA). Other 
albumins such as ovalbumin, mouse serum albumin or rabbit serum albumin can also be 
used as carriers. Means for conjugating a polypeptide to a carrier protein are well known 
10 in the art and include glutaraldehyde, m-maleimidobenzoyl-N-hydroxysuccinimide ester, 
carbodiimide and bis-biazotized benzidine. 

As is also well known in the art, the immunogenicity of a particular immunogen 
composition can be enhanced by the use of non-specific stimulators of the immune 
response, known as adjuvants. Exemplary and preferred adjuvants include complete 
Freund's adjuvant (a non-specific stimulator of the immune response containing killed 
Mycobacterium tuberculosis), incomplete Freund's adjuvants and aluminum hydroxide 
adjuvant. 

20 The amount of immunogen composition used in the production of polyclonal 

antibodies varies upon the nature of the immunogen as well as the animal used for 
immunization. A variety of routes can be used to administer the immunogen 
(subcutaneous, intramuscular, intradermal, intravenous and intraperitoneal). The 
production of polyclonal antibodies may be monitored by sampling blood of the 

25 immunized animal at various points following immunization. A second, booster 
injection, may also be given. The process of boosting and titering is repeated until a 
suitable titer is achieved. When a desired level of immunogenicity is obtained, the 
immunized animal can be bled and the serum isolated and stored, and/or in some cases 
the animal can be used to generate monoclonal antibodies (MAbs). For production of 

30 rabbit polyclonal antibodies, the animal can be bled through an ear vein or alternatively 
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by cardiac puncture. The removed blood is allowed to coagulate and then centrifuged to 
separate serum components from whole cells and blood clots. The serum may be used as 
is for various applications or the desired antibody fraction may be purified by well-known 
methods, such as affinity chromatography using another antibody or a peptide bound to a 
5 solid matrix. 

Monoclonal antibodies (MAbs) may be readily prepared through use of well- 
known techniques, such as those exemplified in U.S. Patent 4,196,265, incorporated 
herein by reference. Typically, this technique involves immunizing a suitable animal 
10 with a selected immunogen composition, e.g., a purified or partially purified expressed 
protein, polypeptide or peptide. The immunizing composition is administered in a 
manner that effectively stimulates antibody producing cells. 

□ 

Jjj The methods for generating monoclonal antibodies (MAbs) generally begin along 

Em 15 the same lines as those for preparing polyclonal antibodies. Rodents such as mice and 

| g rats are preferred animals, however, the use of rabbit, sheep or frog cells is also possible. 

^ The use of rats may provide certain advantages (Goding, 1986, pp. 60-61), but mice are 

preferred, with the B ALB/c mouse being most preferred as this is most routinely used and 

M: generally gives a higher percentage of stable fusions. 

fii 

id 20 

O The animals are injected with antigen as described above. The antigen may be 

coupled to carrier molecules such as keyhole limpet hemocyanin if necessary. The 
antigen would typically be mixed with adjuvant, such as Freund's complete or incomplete 
adjuvant. Booster injections with the same antigen would occur at approximately two- 
25 week intervals. 

Following immunization, somatic cells with the potential for producing 
antibodies, specifically B lymphocytes (B cells), are selected for use in the MAb 
generating protocol. These cells may be obtained from biopsied spleens, tonsils or lymph 
30 nodes, or from a peripheral blood sample. Spleen cells and peripheral blood cells are 
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preferred, the former because they are a rich source of antibody-producing cells that are in 
the dividing plasmablast stage, and the latter because peripheral blood is easily accessible. 
Often, a panel of animals will have been immunized and the spleen of animal with the 
highest antibody titer will be removed and the spleen lymphocytes obtained by 
5 homogenizing the spleen with a syringe. Typically, a spleen from an immunized mouse 
contains approximately 5 X 10 7 to 2 X 10 8 lymphocytes. 

The antibody-producing B lymphocytes from the immunized animal are then 
fused with cells of an immortal myeloma cell, generally one of the same species as the 
10 animal that was immunized. Myeloma cell lines suited for use in hybridoma-producing 
fusion procedures preferably are non-antibody-producing, have high fusion efficiency, 
and have enzyme deficiencies that render them incapable of growing in certain selective 
media that support the growth of only the desired fused cells (hybridomas). 

15 Any one of a number of myeloma cells may be used, as are known to those of skill 

in the art (Goding, 1986). For example, where the immunized animal is a mouse, one 
may use P3-X63/Ag8, X63-Ag8.653, NSl/l.Ag 4 1, Sp210-Agl4, FO, NSO/U, MPC-11, 
MPC11-X45-GTG 1.7 and S194/5XX0 Bui; for rats, one may use R210.RCY3, Y3-Ag 
1.2.3, ER983F and 4B210; and U-266, GM1500-GRG2, LICR-LON-HMy2 and UC729-6 

20 are all useful in connection with human cell fusions. 

One preferred murine myeloma cell is the NS-1 myeloma cell line (also termed 
P3-NS-l-Ag4-l), which is readily available from the NIGMS Human Genetic Mutant 
Cell Repository by requesting cell line repository number GM3573. Another mouse 
25 myeloma cell line that may be used is the 8-azaguanine-resistant mouse murine myeloma 
SP2/0 non-producer cell line. 

Methods for generating hybrids of antibody-producing spleen or lymph node cells 
and myeloma cells usually comprise mixing somatic cells with myeloma cells in a 2: 1 
30 proportion, though the proportion may vary from about 20:1 to about 1:1, respectively, in 
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the presence of an agent or agents (chemical or electrical) that promote the fusion of cell 
membranes. Fusion methods using Sendai virus have been described by Kohler and 
Milstein (1975; 1976), and those using polyethylene glycol (PEG), such as 37% (v/v) 
PEG, by Gefter et al (1977). The use of electrically induced fusion methods is also 
5 appropriate (Goding, 1986). 

Fusion procedures usually produce viable hybrids at low frequencies, about 
1 X 10" 6 to 1 X 10" 8 . However, this low frequency does not pose a problem, as the viable, 
fused hybrids are differentiated from the parental, unfused cells (particularly the unfiised 

10 myeloma cells that would normally continue to divide indefinitely) by culturing in a 
selective medium. The selective medium is generally one that contains an agent that 
blocks the de novo synthesis of nucleotides in the tissue culture media. Exemplary and 
preferred agents are aminopterin, methotrexate, and azaserine. Aminopterin and 
methotrexate block de novo synthesis of both purines and pyrimidines, whereas azaserine 

15 blocks only purine synthesis. Where aminopterin or methotrexate is used, the media is 
supplemented with hypoxanthine and thymidine as a source of nucleotides (HAT 
medium). Where azaserine is used, the media is supplemented with hypoxanthine. 

The preferred selection medium is HAT. Only cells capable of operating 
20 nucleotide salvage pathways are able to survive in HAT medium. The myeloma cells are 
defective in key enzymes of the salvage pathway, e.g., hypoxanthine phosphoribosyl 
transferase (HPRT), and thus they cannot survive. The B cells can operate this pathway, 
but they have a limited life span in culture and generally die within about two weeks. 
Therefore, the only cells that can survive in the selective media are those hybrids formed 
25 from myeloma and B cells. 

This culturing provides a population of hybridomas from which specific 
hybridomas are selected. Typically, selection of hybridomas is performed by culturing 
the cells by single-clone dilution in microtiter plates, followed by testing the individual 
30 clonal supernatants (after about two to three weeks) for the desired reactivity. The assay 
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should be sensitive, simple and rapid, such as radioimmunoassays, enzyme 
immunoassays, cytotoxicity assays, plaque assays, dot immunobinding assays, and the 
like. 

5 The selected hybridomas would then be serially diluted and cloned into individual 

antibody-producing cell lines, which can then be propagated indefinitely to provide 
MAbs. The cell lines may be exploited for MAb production in two basic ways. A sample 
of the hybridoma can be injected (often into the peritoneal cavity) into a histocompatible 
animal of the type that was used to provide the somatic and myeloma cells for the original 

10 fusion. The injected animal develops tumors secreting the specific monoclonal antibody 
produced by the fused cell hybrid. The body fluids of the animal, such as serum or ascites 
fluid, can then be tapped to provide MAbs in high concentration. The individual cell 
lines could also be cultured in vitro, where the MAbs are naturally secreted into the 
culture medium from which they can be readily obtained in high concentrations. MAbs 

15 produced by either means may be further purified, if desired, using filtration, 
centrifugation and various chromatographic methods such as HPLC or affinity 
chromatography. 

Large amounts of the monoclonal antibodies of the present invention may also be 
20 obtained by multiplying hybridoma cells in vivo. Cell clones are injected into mammals 
that are histocompatible with the parent cells, e.g., syngeneic mice, to cause growth of 
antibody-producing tumors. Optionally, the animals are primed with a hydrocarbon, 
especially oils such as pristane (tetramethylpentadecane) prior to injection. 

25 In accordance with the present invention, fragments of the monoclonal antibody of 

the invention can be obtained from the monoclonal antibody produced as described 
above, by methods which include digestion with enzymes such as pepsin or papain and/or 
cleavage of disulfide bonds by chemical reduction. Alternatively, monoclonal antibody 
fragments encompassed by the present invention can be synthesized using an automated 

30 peptide synthesizer, or by expression of full-length gene or of gene fragments in E. coll 
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The monoclonal conjugates of the present invention are prepared by methods 
known in the art, e.g., by reacting a monoclonal antibody prepared as described above 
with, for instance, an enzyme in the presence of a coupling agent such as glutaraldehyde 
5 or periodate. Conjugates with fluorescein markers are prepared in the presence of these 
coupling agents or by reaction with an isothiocyanate. Conjugates with metal chelates are 
similarly produced. Other moieties to which antibodies may be conjugated include 
radionuclides such as 3 H, 125 I, 13l I 32 P, 35 S, I4 C, 51 Cr, 36 C1, 57 Co, 58 Co, 59 Fe, 75 Se, 152 Eu, 
/ and 99m Tc, are other useful labels that can be conjugated to antibodies. Radioactively 

10 labeled monoclonal antibodies of the present invention are produced according to well- 
known methods in the art. For instance, monoclonal antibodies can be iodinated by 
contact with sodium or potassium iodide and a chemical oxidizing agent such as sodium 
hypochlorite, or an enzymatic oxidizing agent, such as lactoperoxidase. Monoclonal 
antibodies according to the invention may be labeled with technetium- 99 by ligand 

15 exchange process, for example, by reducing pertechnate with stannous solution, chelating 
the reduced technetium onto a Sephadex column and applying the antibody to this column 
or by direct labelling techniques, e.g., by incubating pertechnate, a reducing agent such as 
SnCl 2 , a buffer solution such as sodium-potassium phthalate solution, and the antibody. 

20 It will be appreciated by those of skill in the art that monoclonal or polyclonal 

antibodies specific for calpain 10 (or any other calpain-like protein involved in diabetes) 
will have utilities in several types of applications. These can include the production of 
diagnostic kits for use in detecting or diagnosing type 2 diabetes. The skilled practitioner 
will realize that such uses are within the scope of the present invention. 

25 

I. Immunodetection Assays 

The immunodetection methods of the present invention have evident utility in the 
diagnosis of conditions such as type 2 diabetes. Here, a biological or clinical sample 
30 suspected of containing either the encoded protein or peptide or corresponding antibody is 
used. However, these embodiments also have applications to non-clinical samples, such 
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as in the titering of antigen or antibody samples, in the selection of hybridomas, and the 
like. 

In the clinical diagnosis or monitoring of patients with type 2 diabetes, the 
5 detection of an antigen encoded by a calpain 10 encoding nucleic acid, or an increase or 
decrease in the levels of such an antigen, in comparison to the levels in a corresponding 
biological sample from a normal subject is indicative of a patient with type 2 diabetes. 
The basis for such diagnostic methods lies, in part, with the finding that the nucleic acid 
calpain 10 mutants identified in the present invention are responsible for an increased 
10 susceptibility to type 2 diabetes. 

Those of skill in the art are very familiar with differentiating between significant 
expression of a biomarker, which represents a positive identification, and low level or 
background expression of a biomarker. Indeed, background expression levels are often 
15 used to form a "cut-off above which increased staining will be scored as significant or 
positive. Significant expression may be represented by high levels of antigens in tissues 
or within body fluids, or alternatively, by a high proportion of cells from within a tissue 
that each give a positive signal. 

20 L Immunodetection Methods 

In still further embodiments, the present invention concerns immunodetection 
methods for binding, purifying, removing, quantifying or otherwise generally detecting 
biological components. The encoded proteins or peptides of the present invention may be 
employed to detect antibodies having reactivity therewith, or, alternatively, antibodies 

25 prepared in accordance with the present invention, may be employed to detect the 
encoded proteins or peptides. The steps of various useful immunodetection methods have 
been described in the scientific literature, such as, e.g., Nakamura et al (1987). 

In general, the immunobinding methods include obtaining a sample suspected of 
30 containing a protein, peptide or antibody, and contacting the sample with an antibody or 
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protein or peptide in accordance with the present invention, as the case may be, under 
conditions effective to allow the formation of immunocomplexes. 

The immunobinding methods include methods for detecting or quantifying the 
5 amount of a reactive component in a sample, which methods require the detection or 
quantitation of any immune complexes formed during the binding process. Here, one 
would obtain a sample suspected of containing a calpain 10 mutant encoded protein, 
peptide or a corresponding antibody, and contact the sample with an antibody or encoded 
protein or peptide, as the case may be, and then detect or quantify the amount of immune 
10 complexes formed under the specific conditions. 

In terms of antigen detection, the biological sample analyzed may be any sample 
that is suspected of containing a calpain 10 antigen, such as a muscle cell, a homogenized 
tissue extract, an isolated cell, a cell membrane preparation, separated or purified forms 
of any of the above protein-containing compositions, or even any biological fluid that 
comes into contact with diabetic tissue, including blood. 

Contacting the chosen biological sample with the protein, peptide or antibody 
under conditions effective and for a period of time sufficient to allow the formation of 
immune complexes (primary immune complexes) is generally a matter of simply adding 

4 

the composition to the sample and incubating the mixture for a period of time long 
enough for the antibodies to form immune complexes with, z.e., to bind to, any antigens 
present. After this time, the sample-antibody composition, such as a tissue section, 
ELISA plate, dot blot or western blot, will generally be washed to remove any non- 
specifically bound antibody species, allowing only those antibodies specifically bound 
within the primary immune complexes to be detected. 

In general, the detection of immunocomplex formation is well known in the art 
and may be achieved through the application of numerous approaches. These methods 
30 are generally based upon the detection of a label or marker, such as any radioactive, 

78 

A: 230957(4 Y7H01LDOC) 



15 



•-4 



20 



25 



fluorescent, biological or enzymatic tags or labels of standard use in the art. U.S. Patents 
concerning the use of such labels include 3,817,837; 3,850,752; 3,939,350; 3,996,345; 
4,277,437; 4,275,149 and 4,366,241, each incorporated herein by reference. Of course, 
one may find additional advantages through the use of a secondary binding ligand such as 
a second antibody or a biotin/avidin ligand binding arrangement, as is known in the art. 

The encoded protein, peptide or corresponding antibody employed in the detection 
may itself be linked to a detectable label, wherein one would then simply detect this label, 
thereby allowing the amount of the primary immune complexes in the composition to be 
determined. 

Alternatively, the first added component that becomes bound within the primary 
immune complexes may be detected by means of a second binding ligand that has binding 
affinity for the encoded protein, peptide or corresponding antibody. In these cases, the 
second binding ligand may be linked to a detectable label. The second binding ligand is 
itself often an antibody, which may thus be termed a "secondary" antibody. The primary 
immune complexes are contacted with the labeled, secondary binding ligand, or antibody, 
under conditions effective and for a period of time sufficient to allow the formation of 
secondary immune complexes. The secondary immune complexes are then generally 
washed to remove any non-specifically bound labeled secondary antibodies or ligands, 
and the remaining label in the secondary immune complexes is then detected. 

Further methods include the detection of primary immune complexes by a two 
step approach. A second binding ligand, such as an antibody, that has binding affinity for 
the encoded protein, peptide or corresponding antibody is used to form secondary 
immune complexes, as described above. After washing, the secondary immune 
complexes are contacted with a third binding ligand or antibody that has binding affinity 
for the second antibody, again under conditions effective and for a period of time 
sufficient to allow the formation of immune complexes (tertiary immune complexes). 
The third ligand or antibody is linked to a detectable label, allowing detection of the 



A: 230957(4Y7H0l!.DOC) 



79 



tertiary immune complexes thus formed. This system may provide for signal 
amplification if desired. 

2. Immunohistochemistry 

5 The antibodies of the present invention may also be used in conjunction with both 

fresh-frozen and formalin-fixed, paraffin-embedded tissue blocks prepared for study by 
immunohistochemistry (IHC). For example, each tissue block consists of 50 mg of 
diabetic tissue. The method of preparing tissue blocks from these particulate specimens 
has been successfully used in previous IHC studies of various prognostic factors, and is 

10 well known to those of skill in the art (Brown et ai 9 1990; Abbondanzo et al. 9 1990; 
Allredefa/., 1990). 

Briefly, frozen-sections may be prepared by rehydrating 50 ng of frozen 
"pulverized" diabetic tissue at room temperature in phosphate buffered saline (PBS) in 
15 small plastic capsules; pelleting the particles by centrifugation; resuspending them in a 
viscous embedding medium (OCT); inverting the capsule and pelleting again by 
centrifugation; snap-freezing in -70°C isopentane; cutting the plastic capsule and 
removing the frozen cylinder of tissue; securing the tissue cylinder on a cryostat 
microtome chuck; and cutting 25-50 serial sections. 

20 

Permanent-sections may be prepared by a similar method involving rehydration of 
the 50 mg sample in a plastic microfuge tube; pelleting; resuspending in 10% formalin for 
4 hours fixation; washing/pelleting; resuspending in warm 2.5% agar; pelleting; cooling 
in ice water to harden the agar; removing the tissue/agar block from the tube; infiltrating 
25 and embedding the block in paraffin; and cutting up to 50 serial permanent sections. 

5. ELISA 

As noted, it is contemplated that the encoded proteins or peptides of the invention 
will find utility as immunogens, e.g., in immunohistochemistry and in ELISA assays. 
30 One evident utility of the encoded antigens and corresponding antibodies is in 
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immunoassays for the detection of calpain 10 wild-type and mutant proteins, as needed in 
diagnosis and prognostic monitoring of type 2 diabetes. 

Immunoassays, in their most simple and direct sense, are binding assays. Certain 
preferred immunoassays are the various types of enzyme linked immunosorbent assays 
(ELISA) and radioimmunoassays (RIA) known in the art. Immunohistochemical 
detection using tissue sections is also particularly useful. However, it will be readily 
appreciated that detection is not limited to such techniques, and western blotting, dot 
blotting, FACS analyses, and the like may also be used. 

In one exemplary ELISA, antibodies binding to the encoded proteins of the 
invention are immobilized onto a selected surface exhibiting protein affinity, such, as a 
well in a polystyrene microtiter plate. Then, a test composition suspected of containing 
the diapain mutant, such as a clinical sample, is added to the wells. After binding and 
washing to remove non-specifically bound immune complexes, the bound antibody may 
be detected. Detection is generally achieved by the addition of a second antibody specific 
for the target protein, that is linked to a detectable label. This type of ELISA is a simple 
"sandwich ELISA". Detection may also be achieved by the addition of a second 
antibody, followed by the addition of a third antibody that has binding affinity for the 
second antibody, with the third antibody being linked to a detectable label. 

In another exemplary ELISA, the samples suspected of containing the calpain 10 
antigen are immobilized onto the well surface and then contacted with the antibodies of 
the invention. After binding and washing to remove non-specifically bound immune 
complexes, the bound antigen is detected. Where the initial antibodies are linked to a 
detectable label, the immune complexes may be detected directly. Again, the immune 
complexes may be detected using a second antibody that has binding affinity for the first 
antibody, with the second antibody being linked to a detectable label. 



A:230957(4Y7H01!.DOO 



81 



Another ELISA in which the proteins or peptides are immobilized, involves the 
use of antibody competition in the detection. In this ELISA, labeled antibodies are added 
to the wells, allowed to bind to the calpain 10 protein, and detected by means of their 
label. The amount of marker antigen in an unknown sample is then determined by 

5 mixing the sample with the labeled antibodies before or during incubation with coated 
wells. The presence of marker antigen in the sample acts to reduce the amount of 
antibody available for binding to the well and thus reduces the ultimate signal. This is 
appropriate for detecting antibodies in an unknown sample, where the unlabeled 
antibodies bind to the antigen-coated wells and also reduces the amount of antigen 

10 available to bind the labeled antibodies. 

Irrespective of the format employed, ELISAs have certain features in common, 
such as coating, incubating or binding, washing to remove non-specifically bound 
species, and detecting the bound immune complexes. These are described as follows: 

In coating a plate with either antigen or antibody, one will generally incubate the 
wells of the plate with a solution of the antigen or antibody, either overnight or for a 
specified period of hours. The wells of the plate will then be washed to remove 
incompletely adsorbed material. Any remaining available surfaces of the wells are then 
"coated" with a nonspecific protein that is antigenically neutral with regard to the test 
antisera. These include bovine serum albumin (BSA), casein and solutions of milk 
powder. The coating of nonspecific adsorption sites on the immobilizing surface reduces 
the background caused by nonspecific binding of antisera to the surface. 

25 In ELISAs, it is probably more customary to use a secondary or tertiary detection 

means rather than a direct procedure. Thus, after binding of a protein or antibody to the 
well, coating with a non-reactive material to reduce background, and washing to remove 
unbound material, the immobilizing surface is contacted with the control, type 2 diabetes 
and/or clinical or biological sample to be tested under conditions effective to allow 

30 immune complex (antigen/antibody) formation. Detection of the immune complex then 
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requires a labeled secondary binding ligand or antibody, or a secondary binding ligand or 
antibody in conjunction with a labeled tertiary antibody or third binding ligand. 

"Under conditions effective to allow immune complex (antigen/antibody) 
5 formation" means that the conditions preferably include diluting the antigens and 
antibodies with solutions such as BSA, bovine gamma globulin (BGG) and phosphate 
buffered saline (PBS)/Tween™. These added agents also tend to assist in the reduction of 
nonspecific background. 

10 The "suitable" conditions also mean that the incubation is at a temperature and for 

a period of time sufficient to allow effective binding. Incubation steps are typically from 
about 1 to 2 to 4 hours, at temperatures preferably on the order of 25° to 27°C, or may be 
overnight at about 4°C or so. 

15 Following all incubation steps in an ELISA, the contacted surface is washed so as 

to remove non-complexed material. A preferred washing procedure includes washing 
with a solution such as PBS/Tween™, or borate buffer. Following the formation of 
specific immune complexes between the test sample and the originally bound material, 
and subsequent washing, the occurrence of even minute amounts of immune complexes 

20 may be determined. 

To provide a detecting means, the second or third antibody will have an associated 
label to allow detection. Preferably, this label will be an enzyme that will generate color 
development upon incubating with an appropriate chromogenic substrate. Thus, for 
25 example, one will desire to contact and incubate the first or second immune complex with 
a urease, glucose oxidase, alkaline phosphatase or hydrogen peroxidase-conjugated 
antibody for a period of time and under conditions that favor the development of further 
immune complex formation (e.g., incubation for 2 hours at room temperature in a PBS- 
containing solution such as PBS-Tween™). 

30 
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After incubation with the labeled antibody, and subsequent to washing to remove 
unbound material, the amount of label is quantified, e.g., by incubation with a 
chromogenic substrate such as urea and bromocresol purple or 2,2'-azido-di-(3-ethyl- 
benzthiazoline-6-sulfonic acid [ABTS] and H 2 0 2 , in the case of peroxidase as the enzyme 
5 label. Quantitation is then achieved by measuring the degree of color generation, e.g., 
using a visible spectra spectrophotometer. 

4. Use of Antibodies for Radioimaging 

The antibodies of this invention will be used to quantify and localize the 
10 expression of the encoded marker proteins. The antibody, for example, will be labeled by 
any one of a variety of methods and used to visualize the localized concentration of the 
cells producing the encoded protein. Such an assay also will reveal the subcellular 

3 localization of the protein, which can have diagnostic and therapeutic applications. 

5 

15 In accordance with this invention, the monoclonal antibody or fragment thereof 

may be labeled by any of several techniques known to the art. The methods of the present 
invention may also use paramagnetic isotopes for purposes of in vivo detection. Elements 
particularly useful in Magnetic Resonance Imaging ("MRF) include 157 Gd, 55 Mn, 162 Dy, 
52 Cr, and 56 Fe. 

20 

Administration of the labeled antibody may be local or systemic and 
accomplished intravenously, intraarterially, via the spinal fluid or the like. 
Administration may also be intradermal or intracavitary, depending upon the body site 
under examination. After a sufficient time has lapsed for the monoclonal antibody or 
25 fragment thereof to bind with the diseased tissue, for example, 30 minutes to 48 hours, 
the area of the subject under investigation is examined by routine imaging techniques 
such as MRI, SPECT, planar scintillation imaging or newly emerging imaging 
techniques. The exact protocol will necessarily vary depending upon factors specific to 
the patient, as noted above, and depending upon the body site under examination, method 
30 of administration and type of label used; the determination of specific procedures would 
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be routine to the skilled artisan. The distribution of the bound radioactive isotope and its 
increase or decrease with time is then monitored and recorded. By comparing the results 
with data obtained from studies of clinically normal individuals, the presence and extent 
of the diseased tissue can be determined. 

5 

It will be apparent to those of skill in the art that a similar approach may be used 
to radio-image the production of the encoded calpain 10 proteins in human patients. The 
present invention provides methods for the in vivo diagnosis of type 2 diabetes in a 
patient. Such methods generally comprise administering to a patient an effective amount 
10 of a calpain 10-specific antibody, to which antibody is conjugated a marker, such as a 
radioactive isotope or a spin-labeled molecule, that is detectable by non-invasive 
methods. The antibody-marker conjugate is allowed sufficient time to come into contact 
with reactive antigens that are present within the tissues of the patient, and the patient is 
then exposed to a detection device to identify the detectable marker. 

15 

5. Kits 

In still further embodiments, the present invention concerns immunodetection kits 
for use with the immunodetection methods described above. As the encoded proteins or 
peptides may be employed to detect antibodies and the corresponding antibodies may be 
20 employed to detect encoded proteins or peptides, either or both of such components may 
be provided in the kit. The immunodetection kits will thus comprise, in suitable 
container means, an encoded protein or peptide, or a first antibody that binds to an 
encoded protein or peptide, and an immunodetection reagent. 

25 In certain embodiments, the encoded protein or peptide, or the first antibody that 

binds to the encoded protein or peptide, may be bound to a solid support, such as a 
column matrix or well of a microtiter plate. 

The immunodetection reagents of the kit may take any one of a variety of forms, 
30 including those detectable labels that are associated with or linked to the given antibody 



A: 230957(4 Y7H0I1.DOC) 



85 



or antigen, and detectable labels that are associated with or attached to a secondary 
binding ligand. Exemplary secondary ligands are those secondary antibodies that have 
binding affinity for the first antibody or antigen, and secondary antibodies that have 
binding affinity for a human antibody. 

Further suitable immunodetection reagents for use in the present kits include the 
two-component reagent that comprises a secondary antibody that has binding affinity for 
the first antibody or antigen, along with a third antibody that has binding affinity for the 
second antibody, the third antibody being linked to a detectable label. 

The kits may further comprise a suitably aliquoted composition of the encoded 
protein or polypeptide antigen, whether labeled or unlabeled, as may be used to prepare a 
standard curve for a detection assay. 

The kits may contain antibody-label conjugates either in fully conjugated form, in 
the form of intermediates, or as separate moieties to be conjugated by the user of the kit. 
The components of the kits may be packaged either in aqueous media or in lyophilized 
form. 

The container means of the kits will generally include at least one vial, test tube, 
flask, bottle, syringe or other container means, into which the antibody or antigen may be 
placed, and preferably, suitably aliquoted. Where a second or third binding ligand or 
additional component is provided, the kit will also generally contain a second, third or 
other additional container into which this ligand or component may be placed. The kits 
of the present invention will also typically include a means for containing the antibody, 
antigen, and any other reagent containers in close confinement for commercial sale. Such 
containers may include injection or blow-molded plastic containers into which the desired 
vials are retained. 
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J. Methods for Screening Active Compounds 

The present invention also contemplates the use of calpain 10 and active 
fragments, and nucleic acids coding therefor, in the screening of compounds for activity 
in either stimulating calpain 10 activity, overcoming the lack of calpain 10 activity or 
5 blocking the effect of a calpain 10 molecule. These assays may make use of a variety of 
different formats and may depend on the kind of "activity" for which the screen is being 
conducted. Contemplated functional "read-outs" include binding to a compound, 
inhibition of binding to a substrate, ligand, receptor or other binding partner by a 
compound. 

10 

Compounds thus identified will be capable of promoting gene expression, and 
thus can be said to have up-regulating activity. In as much as decreased levels of calpain 
10 indicate an increased susceptibility to type 2 diabetes, any positive substances 
identified by the assays of the present invention will be anti-diabetic drugs. Before 
15 human administration, such compounds would be rigorously tested using conventional 
animal models known to those of skill in the art. 

As stated earlier, the present invention provides the complete sequence of the 
calpain 10 gene. The sequence predicts a protein with extensive homology with 

20 representative members of the large subunit calpain family. The calpain 10 protein acts in 
concert with the protein product of an unknown gene on chromosome 15 to increase 
susceptibility to type 2 diabetes. Thus, in certain embodiments, the binding partner for 
calpain 10 may be the protein encoded by a gene on chromosome 15. This gene may be 
involved in diabetes. Thus the present invention also will be useful in isolating and 

25 identifying the gene on chromosome 15 that has long since been suspected to be involved 
in diabetes. Alternatively, the binding partner may be any agent or protein that is cleaved 
by the action of the protease. 

Virtually any candidate substance may be analyzed by these methods, including 
30 compounds which may interact with calpain 10, calpain 10 binding protein(s), and 
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substances such as enzymes which may act by physically altering one of the structures 
present. Of course, any compound isolated from natural sources such as plants, animals 
or even marine, forest, or soil samples, may be assayed, as may any synthetic chemical or 
recombinant protein. 

5 

L In Vitro Assays 

In one embodiment, the invention is to be applied for the screening of compounds 
that bind to the calpain 10 wild-type molecule, mutant or fragment thereof. The wild-type 
or mutant polypeptide or fragment may be either free in solution, fixed to a support, 
10 expressed in or on the surface of a cell. Either the polypeptide or the compound may be 
labeled, thereby permitting determining of binding. 

In another embodiment, the assay may measure the inhibition of binding of 
calpain 10 to a natural or artificial substrate or binding partner. Competitive binding 
15 assays can be performed in which one of the agents (calpain 10, binding partner or 
compound) is labeled. Usually, the polypeptide will be the labeled species. One may 
measure the amount of free label versus bound label to determine binding or inhibition of 
binding. 

20 Another technique for high throughput screening of compounds is described in 

WO 84/03564. Large numbers of small peptide test compounds are synthesized on a 
solid substrate, such as plastic pins or some other surface. The peptide test compounds 
are reacted with calpain 10 and washed. Bound polypeptide is detected by various 
methods. 

25 

Purified calpain 10 can be coated directly onto plates for use in the 
aforementioned drug screening techniques. However, non-neutralizing antibodies to the 
polypeptide can be used to immobilize the polypeptide to a solid phase. Also, fusion 
proteins containing a reactive region (preferably a terminal region) may be used to link 
30 the calpain 10 active region to a solid phase. 
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Various cell lines containing wild-type or natural or engineered mutations in 
calpain 10 can be used to study various functional attributes of calpain 10 and how a 
candidate compound affects these attributes. Methods for engineering mutations are 
5 described elsewhere in this document, as are naturally-occurring mutations in calpain 10 
that lead to, contribute to and/or otherwise cause diabetes. In such assays, the compound 
would be formulated appropriately, given its biochemical nature, and contacted with a 
target cell. Depending on the assay, culture may be required. The cell may then be 
examined by virtue of a number of different physiologic assays. Alternatively, molecular 
10 analysis may be performed in which the function of calpain 10, or related pathways, may 
be explored. This may involve assays such as those for protein expression, enzyme 
function, substrate utilization, phosphorylation states of various molecules, cAMP levels, 
mRNA expression (including differential display of whole cell or polyA RNA) and 
others. 

15 

2. In Vivo Assays 

The present invention also encompasses the use of various animal models. Here, 
the identity seen between calpain 10 and other calpains provides an excellent opportunity 
to examine the function of calpain 10 in relation to other proteases in a whole animal 
20 system where it is normally expressed. By developing or isolating mutant cells lines that 
fail to express normal calpain 10, one can generate diabetes models in mice that will be 
highly predictive of diabetes in humans and other mammals. 

Alternatively, one may increase the susceptibility of an animal to diabetes by 
25 providing agents known to be responsible for this susceptibility, i.e., providing a mutant 
calpain 10. Finally, transgenic animals (discussed below) that lack a wild-type calpain 10 
may be utilized as models for type 2 diabetes development and treatment. 

Treatment of animals with test compounds will involve the administration of the 
30 compound, in an appropriate form, to the animal. Administration will be by any route the 
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could be utilized for clinical or non-clinical purposes, including but not limited to oral, 
nasal, buccal, rectal, vaginal or topical. Alternatively, administration may be by 
intratracheal instillation, bronchial instillation, intradermal, subcutaneous, intramuscular, 
intraperitoneal or intravenous injection. Specifically contemplated are systemic 
intravenous injection and regional administration via blood or lymph supply. 

Determining the effectiveness of a compound in vivo may involve a variety of 
different criteria. Such criteria include, but are not limited to, survival, improvement of 
hyperglycemia, diminished need for hypoglycemic agents, diminished need for insulin 
requirements, increased insulin synthesis, improved protease activity, improvement in 
immune effector function and improved food intake. 

5. Reporter Genes and Cell-Based Screening Assays 

Cellular assays also are available for screening candidate substances to identify 
those capable of stimulating calpain 10 activity and gene expression. In these assays, the 
increased expression of any natural or heterologous gene under the control of a functional 
calpain 10 promoter may be employed as a measure of stimulatory activity, although the 
use of reporter genes is preferred. 

A reporter gene is a gene that confers on its recombinant host cell a readily 
detectable phenotype that emerges only under specific conditions. In the present case, the 
reporter gene may be placed under the control of the same promoter as the calpain 10 and 
will thus generally be repressed under conditions where the calpain 10 is not being 
expressed and will generally be expressed in the conditions where calpain 10 is being 
expressed. 

Reporter genes are genes which encode a polypeptide not otherwise produced by 
the host cell which is detectable by analysis of the cell culture, e.g., by fluorometric, 
radioisotopic or spectrophotometric analysis of the cell culture. Exemplary enzymes 
include luciferases, transferases, esterases, phosphatases, proteases (tissue plasminogen 
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activator or urokinase), and other enzymes capable of being detected by their physical 
presence or functional activity. A reporter gene often used is chloramphenicol 
acetyltransferase (CAT) which may be employed with a radiolabeled substrate, or 
luciferase, which is measured fluorometrically. 

5 

Another class of reporter genes which confer detectable characteristics on a host 
cell are those which encode polypeptides, generally enzymes, which render their 
transformants resistant against toxins, e.g., the neo gene which protects host cells against 
toxic levels of the antibiotic G418, and genes encoding dihydrofolate reductase, which 
10 confers resistance to methotrexate. Genes of this class are not generally preferred since 
the phenotype (resistance) does not provide a convenient or rapid quantitative output. 
Resistance to antibiotic or toxin requires days of culture to confirm, or complex assay 
procedures if other than a biological determination is to be made. 

15 Other genes of potential for use in screening assays are those capable of 

transforming hosts to express unique cell surface antigens, e.g., viral env proteins such as 
HIV gpl20 or herpes gD, which are readily detectable by immunoassays. However, 
antigenic reporters are not preferred because, unlike enzymes, they are not catalytic and 
thus do not amplify their signals. 

20 

The polypeptide products of the reporter gene are secreted, intracellular or, as 
noted above, membrane bound polypeptides. If the polypeptide is not ordinarily secreted 
it is fused to a heterologous signal sequence for processing and secretion. In other 
circumstances the signal is modified in order to remove sequences that interdict secretion. 

25 For example, the herpes gD coat protein has been modified by site directed deletion of its 
transmembrane binding domain, thereby facilitating its secretion (EP 139,417A). This 
truncated form of the herpes gD protein is detectable in the culture medium by 
conventional immunoassays. Preferably, however, the products of the reporter gene are 
lodged in the intracellular or membrane compartments. Then they can be fixed to the 

30 culture container, e.g., microtiter wells, in which they are grown, followed by addition of 
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a detectable signal generating substance such as a chromogenic substrate for reporter 
enzymes. 

To create an appropriate vector or plasmid for use in such assays one would ligate 
5 the promoter, whether a hybrid or the native diapain-1 promoter, to a DNA segment 
encoding the reporter gene by conventional methods. The diapain-1 promoter sequences 
may be obtained by in vitro synthesis or recovered from genomic DNA and should be 
ligated upstream of the start codon of the reporter gene. The present invention provides 
the promoter region for human calpain 10 gene. The sequences associated with the novel 
10 calpain 10 gene of the present invention are shown in Apendix A, including the calpain 
10 promoter region. Any of these promoters may be particularly preferred in the present 
invention. An AT-rich TATA box region should also be employed and should be located 
between the calpain 10 sequence and the reporter gene start codon. The region 3' to the 
coding sequence for the reporter gene will ideally contain a transcription termination and 
15 polyadenylation site. The promoter and reporter gene may be inserted into a replicable 
vector and transfected into a cloning host such as E. coli, the host cultured and the 
replicated vector recovered in order to prepare sufficient quantities of the construction for 
later transfection into a suitable eukaryotic host. 

20 Host cells for use in the screening assays of the present invention will generally be 

mammalian cells, and are preferably cell lines which may be used in connection with 
transient transfection studies. Cell lines should be relatively easy to grow in large scale 
culture. Also, they should contain as little native background as possible considering the 
nature of the reporter polypeptide. Examples include the Hep G2, VERO, HeLa, human 

25 embryonic kidney (HEK)- 293, CHO, WI38, BHK, COS-7, and MDCK cell lines, with 
monkey CV-1 cells being particularly preferred. 

In one embodiment, the screening assay typically is conducted by growing 
recombinant host cells in the presence and absence of candidate substances and 
30 determining the amount or the activity of the reporter gene. To assay for candidate 
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substances capable of exerting their effects in the presence of calpain 10 gene products, 
one would make serial molar proportions of such gene products that alter calpain 10- 
mediated activity. One would ideally measure the reporter signal level after an incubation 
period that is sufficient to demonstrate mutant-mediated repression of signal expression 
5 in controls incubated solely with mutants. Cells containing varying proportions of 
candidate substances would then be evaluated for signal activation in comparison to the 
suppressed levels. 

Candidates that demonstrate dose related enhancement of reporter gene 
10 transcription or expression are then selected for further evaluation as clinical therapeutic 
agents. The stimulation of activity may be observed in the absence of calpain 10, in 
which case the candidate compound might be a positive stimulator of calpain 10 
expression. Alternatively, the candidate compound might only give a stimulation in the 
presence of a calpain 10 protein having the G-allele, which would indicate that it 
15 functions to oppose the G-allele-mediated suppression of activity. Candidate compounds 
of either class might be useful therapeutic agents that would combat type 2 diabetes. 

4. Rational Drug Design 

The goal of rational drug design is to produce structural analogs of biologically 
20 active polypeptides or compounds with which they interact (agonists, antagonists, 
inhibitors, binding partners, etc.). By creating such analogs, it is possible to fashion drugs 
which are more active or stable than the natural molecules, which have different 
susceptibility to alteration or which may affect the function of various other molecules. 
In one approach, one would generate a three-dimensional structure for calpain 10 or a 
25 fragment thereof. This could be accomplished by x-ray crystallograph, computer 
modeling or by a combination of both approaches. An alternative approach, "alanine 
scan," involves the random replacement of residues throughout molecule with alanine, 
and the resulting affect on function determined. 
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It also is possible to isolate a calpain 10-specific antibody, selected by a functional 
assay, and then solve its crystal structure. In principle, this approach yields a pharmacore 
upon which subsequent drug design can be based. It is possible to bypass protein 
crystallograph altogether by generating anti-idiotypic antibodies to a functional, 
5 pharmacologically active antibody. As a mirror image of a mirror image, the binding site 
of anti-idiotype would be expected to be an analog of the original antigen. The anti- 
idiotype could then be used to identify and isolate peptides from banks of chemically- or 
biologically-produced peptides. Selected peptides would then serve as the pharmacore. 
Anti-idiotypes may be generated using the methods described herein for producing 
10 antibodies, using an antibody as the antigen. 

Thus, one may design drugs which have improved calpain 10 activity or which act 
as stimulators, inhibitors, agonists, antagonists of calpain 10 or molecules affected by 
calpain 10 function. By virtue of the availability of cloned calpain 10 sequences, 
15 sufficient amounts of calpain 10 can be produced to perform crystallographic studies. In 
addition, knowledge of the polypeptide sequences permits computer employed 
predictions of structure-function relationships. 

K. Detection and Quantitation of Nucleic Acid Species 

20 One embodiment of the instant invention comprises a method for identification of 

calpain 10 mutants in a biological sample by amplifying and detecting nucleic acids 
corresponding to calpain 10 mutants. The biological sample can be any tissue or fluid in 
which these mutants might be present. Various embodiments include bone marrow 
aspirate, bone marrow biopsy, lymph node aspirate, lymph node biopsy, spleen tissue, 

25 fine needle aspirate, skin biopsy or organ tissue biopsy. Other embodiments include 
samples where the body fluid is peripheral blood, lymph fluid, ascites, serous fluid, 
pleural effusion, sputum, cerebrospinal fluid, lacrimal fluid, stool or urine. 

Nucleic acid used as a template for amplification is isolated from cells contained 
30 in the biological sample, according to standard methodologies (Sambrook et aL 9 1989). 
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The nucleic acid may be genomic DNA or fractionated or whole cell RNA. Where RNA 
is used, it may be desired to convert the RNA to a complementary DNA. In one 
embodiment, the RNA is whole cell RNA and is used directly as the template for 
amplification. 

5 

Pairs of primers that selectively hybridize to nucleic acids corresponding to 
calpain 10 mutants are contacted with the isolated nucleic acid under conditions that 
permit selective hybridization. Once hybridized, the nucleic acid:primer complex is 
contacted with one or more enzymes that facilitate template-dependent nucleic acid 
10 synthesis. Multiple rounds of amplification, also referred to as "cycles," are conducted 
until a sufficient amount of amplification product is produced. 

Next, the amplification product is detected. In certain applications, the detection 
may be performed by visual means. Alternatively, the detection may involve indirect 
15 identification of the product via chemiluminescence, radioactive scintigraphy of 
incorporated radiolabel or fluorescent label or even via a system using electrical or 
thermal impulse signals (Affymax technology; Bellus, 1994). 

Following detection, one may compare the results seen in a given patient with a 
20 reference group of normal subjects or indeed patients with type 2 and type 1 diabetes. In 
this way, it is possible to correlate the amount of calpain 10 mutants detected with various 
clinical states. 

L Primers 

25 The term primer, as defined herein, is meant to encompass any nucleic acid that is 

capable of priming the synthesis of a nascent nucleic acid in a template-dependent 
process. Typically, primers are oligonucleotides from ten to twenty base pairs in length, 
but longer sequences can be employed. Primers may be provided in double-stranded or 
single-stranded form, although the single-stranded form is preferred. 

30 
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2. Template Dependent Amplification Methods 

A number of template dependent processes are available to amplify the marker 
sequences present in a given template sample. One of the best known amplification 
methods is the polymerase chain reaction (referred to as PCR) which is described in detail 
in U.S. Patent Nos. 4,683,195, 4,683,202 and 4,800,159, and in Innis et al, 1990, each of 
which is incorporated herein by reference in its entirety. 

Briefly, in PCR, two primer sequences are prepared that are complementary to 
regions on opposite complementary strands of the marker sequence. An excess of 
deoxynucleoside triphosphates are added to a reaction mixture along with a DNA 
polymerase, e.g., Taq polymerase. If the marker sequence is present in a sample, the 
primers will bind to the marker and the polymerase will cause the primers to be extended 
along the marker sequence by adding on nucleotides. By raising and lowering the 
temperature of the reaction mixture, the extended primers will dissociate from the marker 
to form reaction products, excess primers will bind to the marker and to the reaction 
products and the process is repeated. 

A reverse transcriptase PCR amplification procedure may be performed in order 
to quantify the amount of mRNA amplified. Methods of reverse transcribing RNA into 
cDNA are well known and described in Sambrook et aL, 1989. Alternative methods for 
reverse transcription utilize thermostable, RNA-dependent DNA polymerases. These 
methods are described in WO 90/07641 filed December 21, 1990. Polymerase chain 
reaction methodologies are well known in the art. 

Another method for amplification is the ligase chain reaction ("LCR"), disclosed 
in EPA No. 320 308, incorporated herein by reference in its entirety. In LCR, two 
complementary probe pairs are prepared, and in the presence of the target sequence, each 
pair will bind to opposite complementary strands of the target such that they abut. In the 
presence of a ligase, the two probe pairs will link to form a single unit. By temperature 
cycling, as in PCR, bound ligated units dissociate from the target and then serve as "target 
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sequences" for ligation of excess probe pairs. U.S. Patent 4,883,750 describes a method 
similar to LCR for binding probe pairs to a target sequence. 

Qbeta Replicase, described in PCT Application No. PCT/US87/00880, may also 
5 be used as still another amplification method in the present invention. In this method, a 
replicative sequence of RNA that has a region complementary to that of a target is added 
to a sample in the presence of an RNA polymerase. The polymerase will copy the 
replicative sequence that can then be detected. 

10 An isothermal amplification method, in which restriction endonucleases and 

ligases are used to achieve the amplification of target molecules that contain nucleotide 
5-[alpha-thio]-triphosphates in one strand of a restriction site may also be useful in the 
amplification of nucleic acids in the present invention, Walker et a/., (1992), incorporated 
herein by reference in its entirety. 

15 

Strand Displacement Amplification (SDA) is another method of carrying out 
isothermal amplification of nucleic acids which involves multiple rounds of strand 
displacement and synthesis, i.e., nick translation. A similar method, called Repair Chain 
Reaction (RCR), involves annealing several probes throughout a region targeted for 

20 amplification, followed by a repair reaction in which only two of the four bases are 
present. The other two bases can be added as biotinylated derivatives for easy detection. 
A similar approach is used in SDA. Target specific sequences can also be detected using 
a cyclic probe reaction (CPR). In CPR, a probe having 3* and 5' sequences of non- 
specific DNA and a middle sequence of specific RNA is hybridized to DNA that is 

25 present in a sample. Upon hybridization, the reaction is treated with RNase H, and the 
products of the probe identified as distinctive products that are released after digestion. 
The original template is annealed to another cycling probe and the reaction is repeated. 

Still another amplification methods described in GB Application No. 2 202 328, 
30 and in PCT Application No. PCT/US 89/0 1025, each of which is incorporated herein by 
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reference in its entirety, may be used in accordance with the present invention. In the 
former application, "modified" primers are used in a PCR-like, template- and enzyme- 
dependent synthesis. The primers may be modified by labelling with a capture moiety 
(e.g., biotin) and/or a detector moiety (e.g., enzyme). In the latter application, an excess 
5 of labeled probes are added to a sample. In the presence of the target sequence, the probe 
binds and is cleaved catalytically. After cleavage, the target sequence is released intact to 
be bound by excess probe. Cleavage of the labeled probe signals the presence of the 
target sequence. 

10 Other nucleic acid amplification procedures include transcription-based 

amplification systems (TAS), including nucleic acid sequence based amplification 
(NASBA) and 3SR (Kwoh et aL 9 1989); Gingeras et al. y PCT Application WO 88/10315, 
incorporated herein by reference in their entirety). In NASBA, the nucleic acids can be 
prepared for amplification by standard phenol/chloroform extraction, heat denaturation of 

15 a clinical sample, treatment with lysis buffer and minispin columns for isolation of DNA 
and RNA or guanidinium chloride extraction of RNA. These amplification techniques 
involve annealing a primer which has target specific sequences. Following 
polymerization, DNA/RNA hybrids are digested with RNase H while double stranded 
DNA molecules are heat denatured again. In either case the single stranded DNA is made 

20 fully double stranded by addition of second target specific primer, followed by 
polymerization. The double-stranded DNA molecules are then multiply transcribed by an 
RNA polymerase such as T7 or SP6. In an isothermal cyclic reaction, the RNAs are 
reverse transcribed into single stranded DNA, which is then converted to double stranded 
DNA, and then transcribed once again with an RNA polymerase such as T7 or SP6. The 

25 resulting products, whether truncated or complete, indicate target specific sequences. 

Davey et aL, EPA No. 329 822 (incorporated herein by reference in its entirety) 
disclose a nucleic acid amplification process involving cyclically synthesizing single- 
stranded RNA ("ssRNA"), ssDNA, and double-stranded DNA (dsDNA), which may be 
30 used in accordance with the present invention. The ssRNA is a template for a first primer 
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oligonucleotide, which is elongated by reverse transcriptase (RNA-dependent DNA 
polymerase). The RNA is then removed from the resulting DNA:RNA duplex by the 
action of ribonuclease H (RNase H, an RNase specific for RNA in duplex with either 
DNA or RNA). The resultant ssDNA is a template for a second primer, which also 
5 includes the sequences of an RNA polymerase promoter (exemplified by 17 RNA 
polymerase) 5' to its homology to the template. This primer is then extended by DNA 
polymerase (exemplified by the large "Klenow" fragment of £. coli DNA polymerase I), 
resulting in a double-stranded DNA ("dsDNA") molecule, having a sequence identical to 
that of the original RNA between the primers and having additionally, at one end, a 

10 promoter sequence. This promoter sequence can be used by the appropriate RNA 
polymerase to make many RNA copies of the DNA. These copies can then re-enter the 
cycle leading to very swift amplification. With proper choice of enzymes, this 
amplification can be done isothermally without addition of enzymes at each cycle. 
Because of the cyclical nature of this process, the starting sequence can be chosen to be in 

15 the form of either DNA or RNA. 

Miller et al, PCT Application WO 89/06700 (incorporated herein by reference in 
its entirety) disclose a nucleic acid sequence amplification scheme based on the 
hybridization of a promoter/primer sequence to a target single-stranded DNA ("ssDNA") 
20 followed by transcription of many RNA copies of the sequence. This scheme is not 
cyclic, Le. 9 new templates are not produced from the resultant RNA transcripts. Other 
amplification methods include "RACE" and "one-sided PCR" (Frohman, M.A., In: PCR 
PROTOCOLS: A GUIDE TO METHODS AND APPLICATIONS, Academic Press, N.Y., 
1990; Ohara et al, 1989; each herein incorporated by reference in their entirety). 

25 

Methods based on ligation of two (or more) oligonucleotides in the presence of 
nucleic acid having the sequence of the resulting "di-oligonucleotide", thereby amplifying 
the di-oligonucleotide, may also be used in the amplification step of the present 
invention. Wu et al, 1989), incorporated herein by reference in its entirety. 

30 
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3. RNase Protection Assay 

Methods for genetic screening by identifying mutations associated with most 
genetic diseases such as diabetes must be able to assess large regions of the genome. 
Once a relevant mutation has been identified in a given patient, other family members and 
5 affected individuals can be screened using methods which are targeted to that site. The 
ability to detect dispersed point mutations is critical for genetic counseling, diagnosis, and 
early clinical intervention as well as for research into the etiology of cancer and other 
genetic disorders. The ideal method for genetic screening would quickly, inexpensively, 
and accurately detect all types of widely dispersed mutations in genomic DNA, cDNA, 
10 and RNA samples, depending on the specific situation. 

Historically, a number of different methods have been used to detect point 
mutations, including denaturing gradient gel electrophoresis ("DGGE"), restriction 
enzyme polymorphism analysis, chemical and enzymatic cleavage methods, and others 
15 (Cotton, 1989). The more common procedures currently in use include direct sequencing 
of target regions amplified by PGR™ and single-strand conformation polymorphism 
analysis ("SSCP"). 

Another method of screening for point mutations is based on RNase cleavage of 
20 base pair mismatches in RNA/DNA and RNA/RNA heteroduplexes. As used herein, the 
term "mismatch" is defined as a region of one or more unpaired or mispaired nucleotides 
in a double-stranded RNA/RNA, RNA/DNA or DNA/DNA molecule. This definition 
thus includes mismatches due to insertion/deletion mutations, as well as single and 
multiple base point mutations. U.S. Patent No. 4,946,773 describes an RNase A 
25 mismatch cleavage assay that involves annealing single-stranded DNA or RNA test 
samples to an RNA probe, and subsequent treatment of the nucleic acid duplexes with 
RNase A. After the RNase cleavage reaction, the RNase is inactivated by proteolytic 
digestion and organic extraction, and the cleavage products are denatured by heating and 
analyzed by electrophoresis on denaturing polyacrylamide gels. For the detection of 
30 mismatches, the single-stranded products of the RNase A treatment, electrophoretically 
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separated according to size, are compared to similarly treated control duplexes. Samples 
containing smaller fragments (cleavage products) not seen in the control duplex are 
scored as +. 

Currently available RNase mismatch cleavage assays, including those performed 
according to U.S. Patent No. 4,946,773, require the use of radiolabeled RNA probes. 
Myers and Maniatis in U.S. Patent No. 4,946,773 describe the detection of base pair 
mismatches using RNase A Other investigators have described the use of E. coli enzyme, 
RNase I, in mismatch assays. Because it has broader cleavage specificity than RNase A, 
RNase I would be a desirable enzyme to employ in the detection of base pair mismatches 
if components can be found to decrease the extent of non-specific cleavage and increase 
the frequency of cleavage of mismatches. The use of RNase I for mismatch detection is 
described in literature from Promega Biotech. Promega markets a kit containing RNase I 
that is shown in their literature to cleave three out of four known mismatches, provided 
the enzyme level is sufficiently high. 

The RNase protection assay as first described by Melton et al (1984) was used to 
detect and map the ends of specific mRNA targets in solution. The assay relies on being 
able to easily generate high specific activity radiolabeled RNA probes complementary to 
the mRNA of interest by in vitro transcription. Originally, the templates for in vitro 
transcription were recombinant plasmids containing bacteriophage promoters. The 
probes are mixed with total cellular RNA samples to permit hybridization to their 
complementary targets, then the mixture is treated with RNase to degrade excess 
unhybridized probe. Also, as originally intended, the RNase used is specific for single- 
stranded RNA, so that hybridized double-stranded probe is protected from degradation. 
After inactivation and removal of the RNase, the protected probe (which is proportional 
in amount to the amount of target mRNA that was present) is recovered and analyzed on 
a polyacrylamide gel. 
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The RNase Protection assay was adapted for detection of single base mutations by 
Myers and Maniatis (1985) and by Winter and Perucho (1985). In this type of RNase A 
mismatch cleavage assay, radiolabeled RNA probes transcribed in vitro from wild type 
sequences, are hybridized to complementary target regions derived from test samples. 

5 The test target generally comprises DNA (either genomic DNA or DNA amplified by 
cloning in plasmids or by PCR™), although RNA targets (endogenous mRNA) have 
occasionally been used (Gibbs and Caskey, 1987; Winter and Perucho, 1985). If single 
nucleotide (or greater) sequence differences occur between the hybridized probe and 
target, the resulting disruption in Watson-Crick hydrogen bonding at that position 

10 ("mismatch") can be recognized and cleaved in some cases by single-strand specific 
ribonuclease. To date, RNase A has been used almost exclusively for cleavage of single- 
base mismatches, although RNase I has recently been shown as useful also for mismatch 
cleavage. There are recent descriptions of using the MutS protein and other DNA-repair 
enzymes for detection of single-base mismatches (Ellis et al, 1994; Lishanski et al, 

15 1994). 

By hybridizing each strand of the wild type probe in RNase cleavage mismatch 
assays separately to the complementary Sense and Antisense strands of the test target, two 
different complementary mismatches (for example, A-C and G-U or G-T) and therefore 

20 two chances for detecting each mutation by separate cleavage events, was provided. 
Myers et al (1985) used the RNase A cleavage assay to screen 615 bp regions of the 
human p-globin gene contained in recombinant plasmid targets. By probing with both 
strands, they were able to detect most, but not all, of the p-globin mutations in their 
model system. The collection of mutants included examples of all the 12 possible types 

25 of mismatches between RNA and DNA: rA/dA, rC/dC, rU/dC, rC/dA, rC/dT, rU/dG, 
rG/dA, rG/dG, rU/dG, rA/dC, rG/dT, and rA/dG. 

Myers et al (1985) showed that certain types of mismatch were more frequently 
and more completely cleaved by RNase A than others. For example, the rC/dA, rC/dC, 
30 and rC/dT mismatches were cleaved in all cases, while the rG/dA mismatch was only 
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cleaved in 13% of the cases tested and the rG/dT mismatch was almost completely 
resistant to cleavage. In general, the complement of a difficult-to-detect mismatch was 
much easier to detect. For example, the refractory rG/dT mismatch generated by probing 
a G to A mutant target with a wild type sense-strand probe, is complemented by the easily 

5 cleaved rC/dA mismatch generated by probing the mutant target with the wild type 
antisense strand. By probing both target strands, Myers and Maniatis (1986) estimated 
that at least 50% of all single-base mutations would be detected by the RNase A cleavage 
assay. These authors stated that approximately one-third of all possible types of single- 
base substitutions would be detected by using a single probe for just one strand of the 

10 target DNA (Myers et al , 1985). 

In the typical RNase cleavage assays, the separating gels are run under denaturing 
^ conditions for analysis of the cleavage products. This requires the RNase to be 

^3 inactivated by treating the reaction with protease (usually Proteinase K, often in the 

m 15 presence of SDS) to degrade the RNase. This reaction is generally followed by an 

^ organic extraction with a phenol/chloroform solution to remove proteins and residual 

RNase activity. The organic extraction is then followed by concentration and recovery of 
a the cleavage products by alcohol precipitation (Myers et al, 1985; Winter et al, 1985; 

Theophilusefa/., 1989). 

fU 20 

p 4. Separation Methods 

r " Following amplification, it may be desirable to separate the amplification product 

from the template and the excess primer for the purpose of determining whether specific 
amplification has occurred. In one embodiment, amplification products are separated by 
25 agarose, agarose-acrylamide or polyacrylamide gel electrophoresis using standard 
methods. See Sambrook et al, 1989. 

Alternatively, chromatographic techniques may be employed to effect separation. 
There are many kinds of chromatography which may be used in the present invention: 
30 adsorption, partition, ion-exchange and molecular sieve, and many specialized techniques 
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for using them including column, paper, thin-layer and gas chromatography (Freifelder, 
1982). 

5. Identification Methods 

5 Amplification products must be visualized in order to confirm amplification of the 

marker sequences. One typical visualization method involves staining of a gel with 
ethidium bromide and visualization under UV light. Alternatively, if the amplification 
products are integrally labeled with radio- or fluorometrically-labeled nucleotides, the 
amplification products can then be exposed to x-ray film or visualized under the 

10 appropriate stimulating spectra, following separation. 

In one embodiment, visualization is achieved indirectly. Following separation of 
amplification products, a labeled, nucleic acid probe is brought into contact with the 
amplified marker sequence. The probe preferably is conjugated to a chromophore but 
15 may be radiolabeled. In another embodiment, the probe is conjugated to a binding 
partner, such as an antibody or biotin, and the other member of the binding pair carries a 
detectable moiety. 

In one embodiment, detection is by Southern blotting and hybridization with a 
20 labeled probe. The techniques involved in Southern blotting are well known to those of 
skill in the art and can be found in many standard books on molecular protocols. See 
Sambrook et a/., 1989. Briefly, amplification products are separated by gel 
electrophoresis. The gel is then contacted with a membrane, such as nitrocellulose, 
permitting transfer of the nucleic acid and non-covalent binding. Subsequently, the 
25 membrane is incubated with a chromophore-conjugated probe that is capable of 
hybridizing with a target amplification product. Detection is by exposure of the 
membrane to x-ray film or ion-emitting detection devices. 

One example of the foregoing is described in U.S. Patent No. 5,279,721, 
30 incorporated by reference herein, which discloses an apparatus and method for the 
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automated electrophoresis and transfer of nucleic acids. The apparatus permits 
electrophoresis and blotting without external manipulation of the gel and is ideally suited 
to carrying out methods according to the present invention. 



5 6. Kit Components 

All the essential materials and reagents required for detecting type-2 diabetes 
markers in a biological sample may be assembled together in a kit. This generally will 
comprise pre-selected primers for specific markers. Also included may be enzymes 
suitable for amplifying nucleic acids including various polymerases (RT, Taq, etc.), 

10 deoxynucleotides and buffers to provide the necessary reaction mixture for amplification. 

Such kits generally will comprise, in suitable means, distinct containers for each 
individual reagent and enzyme as well as for each marker primer pair. Preferred pairs of 
primers for amplifying nucleic acids are selected to amplify the sequences specified in 
15 SEQ ID NO:l along with any other cDNAs for calpain 10. In other embodiments 
preferred pairs of primers for amplification are selected to amplify any of the regions 
specified in SEQ ID NO:l. 

In another embodiment, such kits will comprise hybridization probes specific for 
20 calpain 10, chosen from a group including nucleic acids corresponding to the sequence 
specified in SEQ ID NO:l. Such kits generally will comprise, in suitable means, distinct 
containers for each individual reagent and enzyme as well as for each marker 
hybridization probe. 

25 L. Use of RNA Fingerprinting to Identify Type 2 Diabetes Markers 

RNA fingerprinting is a means by which RNAs isolated from many different 
tissues, cell types or treatment groups can be sampled simultaneously to identify RNAs 
whose relative abundances vary. Two forms of this technology were developed 
simultaneously and reported in 1992 as RNA fingerprinting by differential display (Liang 
30 and Pardee, 1992; Welsh et aL % 1992). (See also Liang and Pardee, U.S. patent 
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5,262,311, incorporated herein by reference in its entirety.) Some of the experiments 
described herein were performed similarly to Donahue et ai, /. Biol Chem. 269: 8604- 
8609, 1994. 



5 All forms of RNA fingerprinting by PCR are theoretically similar but differ in 

their primer design and application. The most striking difference between differential 
display and other methods of RNA fingerprinting is that differential display utilizes 
anchoring primers that hybridize to the poly A tails of mRNAs. As a consequence, the 
PCR products amplified in differential display are biased towards the 3' untranslated 
10 regions of mRNAs. 

The basic technique of differential display has been described in detail (Liang and 
Pardee, 1992). Total cell RNA is primed for first strand reverse transcription with an 
anchoring primer composed of oligo dT and any two of the four deoxynucleosides. The 

15 oligo dT primer is extended using a reverse transcriptase, for example, Moloney Murine 
Leukemia Virus (MMLV) reverse transcriptase. The synthesis of the second strand is 
primed with an arbitrarily chosen oligonucleotide, using reduced stringency conditions. 
Once the double-stranded cDNA has been synthesized, amplification proceeds by 
standard PCR techniques, utilizing the same primers. The resulting DNA fingerprint is 

20 analyzed by gel electrophoresis and ethidium bromide staining or autoradiography. A 
side by side comparison of fingerprints obtained from for example tumor versus normal 
tissue samples using the same oligonucleotide primers identifies mRNAs that are 
differentially expressed. 

25 RNA fingerprinting technology has been demonstrated as being effective in 

identifying genes that are differentially expressed in cancer (Liang et al, 1992; Wong et 
ai, 1993; Sager et ai, 1993; Mok et ai, 1994; Watson et a/., 1994; Chen et ai, 1995; An 
et ai, 1995). The present invention utilizes the RNA fingerprinting technique to identify 
genes that are differentially expressed in diabetes. 

30 
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1 . Design and Theoretical Considerations for Relative Quantitative RT-PCR 

Reverse transcription (RT) of RNA to cDNA followed by relative quantitative 
PCR (RT-PCR) can be used to determine the relative concentrations of specific mRNA 
species isolated from type 2 diabetes patients. By determining that the concentration of a 
specific mRNA species varies, it is shown that the gene encoding the specific mRNA 
species is differentially expressed. This technique can be used to confirm that mRNA 
transcripts shown to be differentially regulated by RNA fingerprinting are differentially 
expressed in type 2 diabetes. 

In PCR, the number of molecules of the amplified target DNA increase by a factor 
approaching two with every cycle of the reaction until some reagent becomes limiting. 
Thereafter, the rate of amplification becomes increasingly diminished until there is no 
increase in the amplified target between cycles. If a graph is plotted in which the cycle 
number is on the X axis and the log of the concentration of the amplified target DNA is 
on the Y axis, a curved line of characteristic shape is formed by connecting the plotted 
points. Beginning with the first cycle, the slope of the line is positive and constant. This 
is said to be the linear portion of the curve. After a reagent becomes limiting, the slope of 
the line begins to decrease and eventually becomes zero. At this point the concentration 
of the amplified target DNA becomes asymptotic to some fixed value. This is said to be 
the plateau portion of the curve. 

The concentration of the target DNA in the linear portion of the PCR 
amplification is directly proportional to the starting concentration of the target before the 
reaction began. By determining the concentration of the amplified products of the target 
DNA in PCR reactions that have completed the same number of cycles and are in their 
linear ranges, it is possible to determine the relative concentrations of the specific target 
sequence in the original DNA mixture. If the DNA mixtures are cDNAs synthesized 
from RNAs isolated from different tissues or cells, the relative abundances of the specific 
mRNA from which the target sequence was derived can be determined for the respective 
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tissues or cells. This direct proportionality between the concentration of the PCR products 
and the relative mRNA abundances is only true in the linear range of the PCR reaction. 

The final concentration of the target DNA in the plateau portion of the curve is 
5 determined by the availability of reagents in the reaction mix and is independent of the. 
original concentration of target DNA. Therefore, the first condition that must be met 
before the relative abundances of a mRNA species can be determined by RT-PCR for a 
collection of RNA populations is that the concentrations of the amplified PCR products 
must be sampled when the PCR reactions are in the linear portion of their curves. 

10 

The second condition that must be met for an RT-PCR experiment to successfully 
determine the relative abundances of a particular mRNA species is that relative 
concentrations of the amplifiable cDNAs must be normalized to some independent 
standard. The goal of an RT-PCR experiment is to determine the abundance of a 
15 particular mRNA species relative to the average abundance of all mRNA species in the 
sample. In the experiments described below, mRNAs for P-actin, asparagine synthetase 
and lipocortin II were used as external and internal standards to which the relative 
abundance of other mRNAs are compared. 

20 Most protocols for competitive PCR utilize internal PCR standards that are 

approximately as abundant as the target. These strategies are effective if the products of 
the PCR amplifications are sampled during their linear phases. If the products are 
sampled when the reactions are approaching the plateau phase, then the less abundant 
product becomes relatively over represented. Comparisons of relative abundances made 

25 for many different RNA samples, such as is the case when examining RNA samples for 
differential expression, become distorted in such a way as to make differences in relative 
abundances of RN As appear less than they actually are. This is not a significant problem 
if the internal standard is much more abundant than the target. If the internal standard is 
more abundant than the target, then direct linear comparisons can be made between RNA 

30 samples. 
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The above discussion describes theoretical considerations for an RT-PCR assay 
for clinically derived materials. The problems inherent in clinical samples are that they 
are of variable quantity (making normalization problematic), and that they are of variable 
quality (necessitating the co-amplification of a reliable internal control, preferably of 
larger size than the target). Both of these problems are overcome if the RT-PCR is 
performed as a relative quantitative RT-PCR with an internal standard in which the 
internal standard is an amplifiable cDNA fragment that is larger than the target cDNA 
fragment and in which the abundance of the mRNA encoding the internal standard is 
roughly 5-100 fold higher than the mRNA encoding the target. This assay measures 
relative abundance, not absolute abundance of the respective mRNA species. 

Other studies may be performed using a more conventional relative quantitative 
RT-PCR assay with an external standard protocol. These assays sample the PCR 
products in the linear portion of their amplification curves. The number of PCR cycles 
that are optimal for sampling must be empirically determined for each target cDNA 
fragment. In addition, the reverse transcriptase products of each RNA population isolated 
from the various tissue samples must be carefully normalized for equal concentrations of 
amplifiable cDNAs. This consideration is very important since the assay measures 
absolute mRNA abundance. Absolute mRNA abundance can be used as a measure of 
differential gene expression only in normalized samples. While empirical determination 
of the linear range of the amplification curve and normalization of cDNA preparations are 
tedious and time consuming processes, the resulting RT-PCR assays can be superior to 
those derived from the relative quantitative RT-PCR assay with an internal standard. 

One reason for this advantage is that without the internal standard/competitor, all 
of the reagents can be converted into a single PCR product in the linear range of the 
amplification curve, thus increasing the sensitivity of the assay. Another reason is that 
with only one PCR product, display of the product on an electrophoretic gel or another 
display method becomes less complex, has less background and is easier to interpret. 
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M. Methods for Calpain 10 Gene Expression 

In one embodiment of the present invention, there are provided methods for the 
increased calpain 10 gene expression or activation in a cell. This is particularly useful 
5 where there is an aberration in the gene product or gene expression is not sufficient for 
normal function. This will allow for the alleviation of symptoms of type 2 diabetes 
experienced as a result of deficiency of calpain 10. Further, given that calpain 10 is a 
protease and that there is a great diversity of proteases and the myriad functions they 
perform, additional proteases may be implicated in diabetes susceptibility. Specifically, 
10 one of the side effects of the long-term use of protease inhibitors in patients with AIDS is 
diabetes (Flexner, 1998). Thus, calpain 10 gene expression could be increased or 
activated in such patients. 

The general approach to increasing calpain 10 activity according to the present 
15 invention, will be to provide a cell with an calpain 10 polypeptide. While it is conceivable 
that the protein may be delivered directly, a preferred embodiment involves providing a 
nucleic acid encoding a calpain 10 polypeptide, Le., a calpain 10 gene, to the cell. 
Following this provision, the calpain 10 polypeptide is synthesized by the host cell's 
transcriptional and translational machinery, as well as any that may be provided by the 
20 expression construct. exacting regulatory elements necessary to support the expression of 
the calpain 10 gene will be provided, in the form of an expression construct. It also is 
possible that expression of virally-encoded calpain 10 could be stimulated or enhanced, or 
the expressed polypeptide be stabilized, thereby achieving the same or similar effect. 

25 In order to effect expression of constructs encoding calpain 10 and other calpain 10- 

like genes, the expression construct must be delivered into a cell. One mechanism for 
delivery is via viral infection, where the expression construct is encapsidated in a viral 
particle which will deliver either a replicating or non-replicating nucleic acid. In certain 
embodiments an HS V vector is used, although virtually any vector would suffice. 

30 
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Several non-viral methods for the transfer of expression constructs into cultured 
mammalian cells also are contemplated by the present invention. These include calcium 
phosphate precipitation (Graham and Van Der Eb, 1973; Chen and Okayama, 1987; Rippe 
et al, 1990) DEAE-dextran (Gopal, 1985), electroporation (Tur-Kaspa et aL, 1986; Potter 
5 et aL, 1984), direct microinjection (Harland and Weintraub, 1985), DNA-loaded liposomes 
(Nicolau and Sene, 1982; Fraley et al, 1979) and lipofectamine-DNA complexes, cell 
sonication (Fechheimer et aL, 1987), gene bombardment using high velocity 
microprojectiles (Yang et aL, 1990), and receptor-mediated transfection (Wu and Wu, 
1987; Wu and Wu, 1988). Some of these techniques may be successfully adapted for in 
10 vivo or ex vivo use, as discussed below. 

In another embodiment of the invention, the expression construct may simply 
consist of naked recombinant DNA or plasmids. Transfer of the construct may be 
performed by any of the methods mentioned above which physically or chemically 

15 permeabilize the cell membrane. This is particularly applicable for transfer in vitro, but it 
may be applied to in vivo use as well. Another embodiment of the invention for transferring 
a naked DNA expression construct into cells may involve particle bombardment. This 
method depends on the ability to accelerate DNA coated microprojectiles to a high velocity 
allowing them to pierce cell membranes and enter cells without killing them (Klein et ai 9 

20 1987). Several devices for accelerating small particles have been developed. One such 
device relies on a high voltage discharge to generate an electrical current, which in turn 
provides the motive force (Yang et aL, 1990). The microprojectiles used have consisted of 
biologically inert substances such as tungsten or gold beads. 

25 In a further embodiment of the invention, the expression construct may be entrapped 

in a liposome. Liposomes are vesicular structures characterized by a phospholipid bilayer 
membrane and an inner aqueous medium. Multilamellar liposomes have multiple lipid 
layers separated by aqueous medium. They form spontaneously when phospholipids are 
suspended in an excess of aqueous solution. The lipid components undergo self- 

30 rearrangement before the formation of closed structures and entrap water and dissolved 
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solutes between the lipid bilayers (Ghosh and Bachhawat, 1991). Also contemplated are 
lipofectamine-DNA complexes. 

Liposome-mediated nucleic acid delivery and expression of foreign DNA in vitro 
5 has been very successful Wong et aL (1980) demonstrated the feasibility of liposome- 
mediated delivery and expression of foreign DNA in cultured chick embryo, HeLa and 
hepatoma cells. In certain embodiments of the invention, the liposome may be complexed 
with a hemagglutinating virus (HVJ). This has been shown to facilitate fusion with the cell 
membrane and promote cell entry of liposome-encapsulated DNA (Kaneda et aL, 1989). In 

10 other embodiments, the liposome may be complexed or employed in conjunction with 
nuclear non-histone chromosomal proteins (HMG-1) (Kato et aL, 1991). In yet further 
embodiments, the liposome may be complexed or employed in conjunction with both HVJ 
and HMG-1. In other embodiments, the delivery vehicle may comprise a ligand and a 
liposome. Where a bacterial promoter is employed in the DNA construct, it also will be 

15 desirable to include within the liposome an appropriate bacterial polymerase. 

Other expression constructs which can be employed to deliver a nucleic acid 
encoding a calpain 10 transgene into cells are receptor-mediated delivery vehicles. These 
take advantage of the selective uptake of macromolecules by receptor-mediated endocytosis 
20 in almost all eukaryotic cells. Because of the cell type-specific distribution of various 
receptors, the delivery can be highly specific (Wu and Wu, 1993). 

Receptor-mediated gene targeting vehicles generally consist of two components: a 
cell receptor-specific ligand and a DNA-binding agent Several ligands have been used for 

25 receptor-mediated gene transfer. The most extensively characterized ligands are 
asialoorosomucoid (ASOR) (Wu and Wu, 1987) and transferrin (Wagner et aL, 1990). 
Recendy, a synthetic neoglycoprotein, which recognizes the same receptor as ASOR, has 
been used as a gene delivery vehicle (Ferkol et aL, 1993; Perales et aL, 1994). Mannose can 
be used to target the mannose receptor on liver cells. Also, antibodies to CD5 (CLL), CD22 

30 (lymphoma), CD25 (T-cell leukemia) and MAA (melanoma) can similarly be used as 
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targeting moieties. In other embodiments, the delivery vehicle may comprise a ligand and a 
liposome. 

Primary mammalian cell cultures may be prepared in various ways. In order for the 
5 cells to be kept viable while in vitro and in contact with the expression construct, it is 
necessary to ensure that the cells maintain contact with the correct ratio of oxygen and 
carbon dioxide and nutrients but are protected from microbial contamination. Cell culture 
techniques are well documented and are disclosed herein by reference (Freshner, 1992). 

10 One embodiment of the foregoing involves the use of gene transfer to immortalize 

cells for the production of proteins. The gene for the protein of interest may be 
transferred as described above into appropriate host cells followed by culture of cells 
under the appropriate conditions. The gene for virtually any polypeptide may be 
employed in this manner. The generation of recombinant expression vectors, and the 

15 elements included therein, are discussed above. Alternatively, the protein to be produced 
may be an endogenous protein normally synthesized by the cell in question. 

Examples of useful mammalian host cell lines are Vero and HeLa cells and cell 
lines of Chinese hamster ovary, W138, BHK, COS-7, 293, HepG2, NIH3T3, RIN and 

20 MDCK cells. In addition, a host cell strain may be chosen that modulates the expression 
of the inserted sequences, or modifies and process the gene product in the manner 
desired. Such modifications (e.g., glycosylation) and processing (e.g., cleavage) of 
protein products may be important for the function of the protein. Different host cells 
have characteristic and specific mechanisms for the post-translational processing and 

25 modification of proteins. Appropriate cell lines or host systems can be chosen to insure 
the correct modification and processing of the foreign protein expressed. 

A number of selection systems may be used including, but not limited to, HSV 
thymidine kinase, hypoxanthine-guanine phosphoribosyltransferase and adenine 
30 phosphoribosyltransferase genes, in tk- y hgprt- or aprt- cells, respectively. Also, anti- 



A; 230957(4 Y7H0ILDOC) 



113 



metabolite resistance can be used as the basis of selection for dhfr, that confers resistance 
to; gpt* that confers resistance to mycophenolic acid; neo, that confers resistance to the 
aminoglycoside G418; and hygro, that confers resistance to hygromycin. 



5 Animal cells can be propagated in vitro in two modes: as non-anchorage 

dependent cells growing in suspension throughout the bulk of the culture or as anchorage- 
dependent cells requiring attachment to a solid substrate for their propagation (i.e., a 
monolayer type of cell growth). 

10 Non-anchorage dependent or suspension cultures from continuous established cell 

lines are the most widely used means of large scale production of cells and cell products. 
However, suspension cultured cells have limitations, such as tumorigenic potential and 
f 3 lower protein production than adherent cells. 

CR 15 Large scale suspension culture of mammalian cells in stirred tanks is a common 

m method for production of recombinant proteins. Two suspension culture reactor designs 

are in wide use - the stirred reactor and the airlift reactor. The stirred design has 
s _ successfully been used on an 8000 liter capacity for the production of interferon. Cells 

\1 are grown in a stainless steel tank with a height-to-diameter ratio of 1:1 to 3:1. The 

|^ 20 culture is usually mixed with one or more agitators, based on bladed disks or marine 
O propeller patterns. Agitator systems offering less shear forces than blades have been 

described! Agitation may be driven either directly or indirectly by magnetically coupled 

drives. Indirect drives reduce the risk of microbial contamination through seals on stirrer 

shafts. 

25 

The airlift reactor, also initially described for microbial fermentation and later 
adapted for mammalian culture, relies on a gas stream to both mix and oxygenate the 
culture. The gas stream enters a riser section of the reactor and drives circulation. Gas 
disengages at the culture surface, causing denser liquid free of gas bubbles to travel 
30 downward in the downcomer section of the reactor. The main advantage of this design is 
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the simplicity and lack of need for mechanical mixing. Typically, the height-to-diameter 
ratio is 10:1. The airlift reactor scales up relatively easily, has good mass transfer of 
gases and generates relatively low shear forces. 



5 N. Methods for Blocking Calpain 10 Action 

In another embodiment of the present invention, there is contemplated the method 
of blocking the function of calpain 10 in type 2 diabetes. In this way, it may be possible 
to curtail the effects of excess calpain 10 in diabetes. In addition, it may prove effective 
to use this sort of therapeutic intervention in combination with more traditional diabetes 
10 therapies, such as the administration of insulin. 

The general form that this aspect of the invention will take is the provision, to a 
cell, of an agent that will inhibit calpain 10 function. Four such agents are contemplated. 
First, one may employ an antisense nucleic acid that will hybridize either to the calpain 

15 10 gene or the calpain 10 gene transcript, thereby preventing transcription or translation, 
respectively. The considerations relevant to the design of antisense constructs have been 
presented above. Second, one may utilize a calpain 10-binding protein or peptide, for 
example, a peptidomimetic or an antibody that binds immunologically to calpain 10. The 
binding of either will block or reduce the activity of the calpain 10. The methods of 

20 making and selecting peptide binding partners and antibodies are well known to those of 
skill in the art. Third, one may provide to the cell an antagonist of calpain 10, for 
example, an inhibitor, alone or coupled to another agent. Fourth, one may provide an 
agent that binds to the calpain 10 substrate(s) without the same functional result as would 
arise with calpain 10 binding. 

25 

The compounds anticipated herein have activity as inhibitors of proteases, such 
cysteine proteases, including calpain. It is believed by those of skill in this art that 
excessive activation of the Ca 2+ -dependent protease calpain plays a role in the pathology 
of a variety of disorders, including cerebral ischaemia, cataract, myocardial ischaemia, 
30 muscular dystrophy and platelet aggregation. Thus, compounds that have activity as 
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calpain inhibitors are considered by those of skill in this art to be useful (U.S. Pat. No. 
5,081,284; Sherwood et al, 1993). Assays that measure the anti-calpain activity of 
selected compounds are known to those of skill in the art (U.S. Pat. No. 5,081,284). 
Activities of inhibitors in such in vitro assays at concentrations (IC so) in the nanomolar 
5 range or lower are indicative of therapeutic activity. Such compounds also have utility in 
the purification of proteinases, such as cysteine proteases, on affinity columns of these 
compounds (U.S. Pat. No. 5,081,284). Also, calpain inhibtors, such as N- 
Acetylleucylleucyinorleucinal (EP 0 504 938 A2; Sherwood et al, 1993 are used as 
reagents in the study of protein trafficking and other cellular processes (Sharma et aL, 
10 1992). Finally, inhibitors of cysteine proteases strongly inhibit the growth of Plasmodium 
falciparumand Schistosoma mansoni (Scheibel et al, 1984). 

Provision of a calpain 10 gene, a calpain 10 protein, or a calpain 10 antagonist, 
would be according to any appropriate pharmaceutical route. The formulation of such 
15 compositions and their delivery to tissues is discussed below. The method by which the 
nucleic acid, protein or chemical is transferred, along with the preferred delivery route, will 
be selected based on the particular site to be treated. Those of skill in the art are capable of 
determining the most appropriate methods based on the relevant clinical considerations. 

20 Many of the gene transfer techniques that generally are applied in vitro can be 

adapted for ex vivo or in vivo use. For example, selected organs including the liver, skin, 
and muscle tissue of rats and mice have been bombarded in vivo (Yang et al, 1990; Zelenin 
et al., 1991). Naked DNA also has been used in clinical settings to effect gene therapy. 
These approaches may require surgical exposure of the target tissue or direct target tissue 

25 injection. Nicolau et al (1987) accomplished successful liposome-mediated gene transfer 
in rats after intravenous injection. 

Dubensky et al (1984) successfully injected polyomavirus DNA in the form of 
CaPC>4 precipitates into liver and spleen of adult and newborn mice demonstrating active 
30 viral replication and acute infection. Benvenisty and Neshif (1986) also demonstrated that 
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direct intraperitoneal injection of CaPC>4 precipitated plasmids results in expression of the 
transfected genes. Thus, it is envisioned that DNA encoding an antisense construct also 
may be transferred in a similar manner in vivo. 

5 Where the embodiment involves the use of an antibody that recognizes a calpain 10 

polypeptide, consideration must be given to the mechanism by which the antibody is 
introduced into the cell cytoplasm. This can be accomplished, for example, by providing an 
expression construct that encodes a single-chain antibody version of the antibody to be 
provided. Most of the discussion above relating to expression constructs for antisense 
10 versions of the calpain 10 gene will be relevant to this aspect of the invention. 
Alternatively, it is possible to present a Afunctional antibody, where one antigen binding 
arm of the antibody recognizes a calpain 10 polypeptide and the other antigen binding arm 
fn recognizes a receptor on the surface of the cell to be targeted. Examples of suitable 

%S receptors would be an HS V glycoprotein such as gB, gC, gD, or gH. In addition, it may be 

\i 

Cm 15 possible to exploit the Fc-binding function associated with HSV gE, thereby obviating the 

need to sacrifice one arm of the antibody for purposes of cell targeting. 

g- Advantageously, one may combine this approach with more conventional diabetes 

iX therapy options. 

W 20 

ESC 

□ O. Transgenic Animals/Knockout Animals 

In one embodiment of the invention, transgenic animals are produced which 
contain a functional transgene encoding wild-type or calpain 10 polypeptides. Transgenic 
animals expressing calpain 10 transgenes, recombinant cell lines derived from such 
25 animals and transgenic embryos may be useful in methods for screening for and 
identifying agents that induce or repress function of calpain 10. Such models will be 
useful in identifiying new and novel agents that will be useful in a diabetes therapeutic 
context. Transgenic animals of the present invention also can be used as models for 
studying indications of abnormal calpain 10 expression in diabetes. 

30 
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In one embodiment of the invention, a calpain 10 transgene is introduced into a 
non-human host to produce a transgenic animal expressing a human calpain 10. The 
transgenic animal is produced by the integration of the transgene into the genome in a 
manner that permits the expression of the transgene. Methods for producing transgenic 

5 animals are generally described by Wagner and Hoppe (U.S. Patent 4,873,191; which is 
incorporated herein by reference), Brinster et al 1985; which is incorporated herein by 
reference in its entirety) and in "Manipulating the Mouse Embryo; A Laboratory Manual" 
2nd edition (eds., Hogan, Beddington, Costantimi and Long, Cold Spring Harbor 
Laboratory Press, 1994; which is incorporated herein by reference in its entirety). 

10 Additional descriptions for generating transgenic animal models may be found in 
numerous published Patents inlcuding but not limited to U.S. Patent 5,817,912; U.S. 
Patent 5,817,911; U.S. Patent 5,814,716; U.S. Patent 5,814,318; U.S. Patent 5,811,634; 
U.S. Patent 5,741,957; U.S. Patent 5,731,489; U.S. Patent 5,770,429; U.S. Patent 
5,718,883, each of these patents is specifically incorporated herein by reference as 

15 teaching methods and compositions for the production of transgenic animals. 

It may be desirable to replace the endogenous calpain 10 by homologous 
recombination between the transgene and the endogenous gene; or the endogenous gene 
may be eliminated by deletion as in the preparation of "knock-out" animals. Typically, a 

20 calpain 10 gene flanked by genomic sequences is transferred by microinjection into a 
fertilized egg. The microinjected eggs are implanted into a host female, and the progeny 
are screened for the expression of the transgene. Transgenic animals may be produced 
from the fertilized eggs from a number of animals including, but not limited to rodents, 
reptiles, amphibians, birds, mammals, and fish. Within a particularly preferred 

25 embodiment, transgenic mice are generated which overexpress calpain 10 or express a 
mutant form of the polypeptide. Alternatively, the absence of a calpain 10 in "knock-out" 
mice permits the study of the effects that loss of calpain 10 protein has on a cell in vivo. 
Knock-out mice also provide a model for the development of calpain 10-related 
abnormalities such as diabetes. 

30 
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As noted above, transgenic animals and cell lines derived from such animals may 
find use in certain testing experiments. In this regard, transgenic animals and cell lines 
capable of expressing wild-type or calpain 10 may be exposed to test substances. These 
test substances can be screened for the ability to enhance wild-type calpain 10 expression 
5 and/or function or impair the expression or function of calpain 10. 

P. Pharmaceuticals and In vivo Methods for the Treatment of Disease 

Aqueous pharmaceutical compositions of the present invention will have an 
effective amount of a calpain 10 expression construct, an antisense calpain 10 expression 

10 construct, an expression construct that encodes a therapeutic gene along with calpain 10, 
a protein or compound that inhibits mutated calpain 10 function respectively, such as an 
anti-calpain 10 antibody. Pharmaceutical compositions of the present invention may also 
have an effective amount of a calpain inhibitor, such as calpeptin, calpain inhibitor 1, 
calpain inhibitor 2 (N-acetyl-leucyl-leucyl-methioninal, ALLM), or E-64-d. Such 

15 compositions generally will be dissolved or dispersed in a pharmaceutically acceptable 
carrier or aqueous medium. An "effective amount," for the purposes of therapy, is 
defined at that amount that causes a clinically measurable difference in the condition of 
the subject. This amount will vary depending on the substance, the condition of the 
patient, the type of treatment, the location of the lesion, etc. 

20 

The phrases "pharmaceutically or pharmacologically acceptable" refer to 
molecular entities and compositions that do not produce an adverse, allergic or other 
untoward reaction when administered to an animal, or human, as appropriate. As used 
herein, "pharmaceutically acceptable carrier" includes any and all solvents, dispersion 

25 media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying 
agents and the like. The use of such media and agents for pharmaceutically active 
substances is well known in the art. Except insofar as any conventional media or agent is 
incompatible with the active ingredients, its use in the therapeutic compositions is 
contemplated. Supplementary active ingredients, such as other anti-diabetic agents, can 

30 also be incorporated into the compositions. 
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In addition to the compounds formulated for parenteral administration, such as 
those for intravenous or intramuscular injection, other pharmaceutically acceptable forms 
include, e.g., tablets or other solids for oral administration; time release capsules; and any 
5 other form currently used, including cremes, lotions, mouthwashes, inhalants and the like. 

The active compounds of the present invention will often be formulated for 
parenteral administration, e.g., formulated for injection via the intravenous, 
intramuscular, subcutaneous, or even intraperitoneal routes. The preparation of an 

10 aqueous composition that contains calpain 10 inhibitory compounds alone or in 
combination with a conventional diabetes therapy agents as active ingredients will be 
known to those of skill in the art in light of the present disclosure. Typically, such 
compositions can be prepared as injectables, either as liquid solutions or suspensions; 
solid forms suitable for using to prepare solutions or suspensions upon the addition of a 

15 liquid prior to injection can also be prepared; and the preparations can also be emulsified. 

Solutions of the active compounds as free base or pharmacologically acceptable 
salts can be prepared in water suitably mixed with a surfactant, such as 
hydroxypropylcellulose. Dispersions can also be prepared in glycerol, liquid polyethylene 
20 glycols, and mixtures thereof and in oils. Under ordinary conditions of storage and use, 
these preparations contain a preservative to prevent the growth of microorganisms. 

The pharmaceutical forms suitable for injectable use include sterile aqueous 
solutions or dispersions; formulations including sesame oil, peanut oil or aqueous 
25 propylene glycol; and sterile powders for the extemporaneous preparation of sterile 
injectable solutions or dispersions. In many cases, the form must be sterile and must be 
fluid to the extent that easy syringability exists. It must be stable under the conditions of 
manufacture and storage and must be preserved against the contaminating action of 
microorganisms, such as bacteria and fungi. 

30 
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The active compounds may be formulated into a composition in a neutral or salt 
form. Pharmaceutical^ acceptable salts, include the acid addition salts (formed with the 
free amino groups of the protein) and which are formed with inorganic acids such as, for 
example, hydrochloric or phosphoric acids, or such organic acids as acetic, oxalic, 
tartaric, mandelic, and the like. Salts formed with the free carboxyl groups can also be 
derived from inorganic bases such as, for example, sodium, potassium, ammonium, 
calcium, or ferric hydroxides, and such organic bases as isopropylamine, trimethylamine, 
histidine, procaine and the like. 

The carrier also can be a solvent or dispersion medium containing, for example, 
water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyethylene 
glycol, and the like), suitable mixtures thereof, and vegetable oils. The proper fluidity 
can be maintained, for example, by the use of a coating, such as lecithin, by the 
maintenance of the required particle size in the case of dispersion and by the use of 
surfactants. The prevention of the action of microorganisms can be brought about by 
various antibacterial and antifungal agents, for example, parabens, chlorobutanol, phenol, 
sorbic acid, thimerosal, and the like. In many cases, it will be preferable to include 
isotonic agents, for example, sugars or sodium chloride. Prolonged absorption of the 
injectable compositions can be brought about by the use in the compositions of agents 
delaying absorption, for example, aluminum monostearate and gelatin: 

Sterile injectable solutions are prepared by incorporating the active compounds in 
the required amount in the appropriate solvent with various of the other ingredients 
enumerated above, as required, followed by filtered sterilization. Generally, dispersions 
are prepared by incorporating the various sterilized active ingredients into a sterile vehicle 
which contains the basic dispersion medium and the required other ingredients from those 
enumerated above. In the case of sterile powders for the preparation of sterile injectable 
solutions, the preferred methods of preparation are vacuum-drying and freeze-drying 
techniques which yield a powder of the active ingredient plus any additional desired 
ingredient from a previously sterile-filtered solution thereof. 



A: 230957<4Y7H01!.DOQ 



121 



Upon formulation, solutions will be administered in a manner compatible with the 
dosage formulation and in such amount as is therapeutically effective. The formulations 
are easily administered in a variety of dosage forms, such as the type of injectable 
5 solutions described above, with even drug release capsules and the like being employable. 

For parenteral administration in an aqueous solution, for example, the solution 
should be suitably buffered if necessary and the liquid diluent first rendered isotonic with 
sufficient saline or glucose. These particular aqueous solutions are especially suitable for 

10 intravenous, intramuscular, subcutaneous and intraperitoneal administration. In this 
connection, sterile aqueous media which can be employed will be known to those of skill 
in the art in light of the present disclosure. For example, one dosage could be dissolved 
in 1 mL of isotonic NaCl solution and either added to 1000 mL of hypodermoclysis fluid 
or injected at the proposed site of infusion, (see for example, "Remington's 

15 Pharmaceutical Sciences" 15th Edition, pages 1035-1038 and 1570-1580). Some 
variation in dosage will necessarily occur depending on the condition of the subject being 
treated. The person responsible for administration will, in any event, determine the 
appropriate dose for the individual subject. 

20 Q. Examples 

The following examples are included to demonstrate preferred embodiments of 
the invention. It should be appreciated by those of skill in the art that the techniques 
disclosed in the examples which follow represent techniques discovered by the inventor 
to function well in the practice of the invention, and thus can be considered to constitute 
25 preferred modes for its practice. However, those of skill in the art should, in light of the 
present disclosure, appreciate that many changes can be made in the specific 
embodiments which are disclosed and still obtain a like or similar result without 
departing from the spirit and scope of the invention. 



A: 230957(4 Y7H0 1 LDOC) 



122 



EXAMPLE 1 
Methods 

Generation of a physical map and sequence of the NIDDM1 region 

YAC clones containing sequences of interest were identified by screening the 
5 CEPH 'A' and 'B' Human YAC DNA pools (Research Genetics, Huntsville, AL) using 
PCR™ and standard methods. PAC (PAC-6539; Genome Systems, St. Louis, MO) and 
BAC clones (CUB Human BAC DNA Pools - Release IV, Research Genetics) were 
identified in a similar manner. 

10 DNA was prepared from each clone and tested directly for the presence of each 

STS. STSs were selected from the Genethon human genetic linkage map and the human 
transcript map in the interval around D2S125-D2S140 (http://www.ncbi.nlm.nih.gov). 
Additional STSs were generated by sequencing ends of clones and by sequencing random 
Pstl fragments from the PACs and BACs after cloning in pGEM-4Z. The sequences of 

15 these clones were compared to those in the nonredundant GenBank database to identify 
unmapped ESTs from this region. 

The sequence of a 50 kb region including NIDDM1 was assembled from the 
sequences of restriction enzyme- (EcoRI, BaniKl, HiftdUL, Pstl and Sau 3 AI) of PCR™- 

20 generated fragments of b204E21 and p278G8. This sequence was examined for putative 
exons using the exon prediction and gene modeling program Grail 2 
(http://compbio.ornl.gov) and for homology with known sequences in the GenBank 
database (http://www.ncbi.nlm.nih.gov) using the BLAST suite of programs. The 
sequence was screened for repeated sequences using the programs Grail 2 and 

25 RepeatMasker (Smit AFA, Green P -RepeatMasker at 
http://ftp.genome.wasWngton.edi^RM/RepeatMasker.html). 

The calpain-like protein and G-protein coupled receptor were examined for 
sequence motifs using the program PR0S1TE (http://www.ebi.ac.uk). Multiple 
30 alignment of amino acid sequences was carried using the CLUSTAL X software package 
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(http://www.igbmc.u-strasbg.fr/BioInfo/clustal). Phylogenetic trees were constructed 
using the neighbor joining method based on the number of amino acid substitutions. 
Bootstrap tests were performed using a random number generator and number of 
bootstrap trials of 1,000 and 10,000, respectively. The tree was drawn using the 
5 TREEVLEW package (http://taxonomy.zoology.gla.ac.uk/rod/treeview). 

RNA expression studies 

Calpain 10 and GPR35 cDNA fragments were labeled by random priming and 
hybridized to Human RNA Master and Multiple Tissue Northern (MTN™) Blots 

10 (Clontech, Palo Alto, CA). Membranes were washed under high-stringency conditions 
(55° and 0.1 x SSC and 0.1% SDS) before exposure to X-ray film. The human calpain 10 
probe was a 2,484 bp fragment containing the entire coding region and 41 and 427 
nucleotides of 5'- and 3' -untranslated region, respectively, and the probe for GPR35 was 
a 1,558 bp fragment that included the entire 309 amino acid coding region and 464 and 

15 167 nucleotides of 5'- and 3'-untranslated region, respectively. The tissue distribution of 
mouse calpain 10 was determined by hybridization of a mouse cDNA probe encoding 
entire coding region to a mouse MTN™ blot. 

cDNA cloning 

20 Human calpain 10 cDNA sequences were obtained by sequencing EST yg33dl0 

(IMAGE Consortium, Research Genetics), vector-primer and primer-primer amplification 
of various human cDNA libraries, and 5'- and 3 '-RACE. The 3 '-RACE was carried out 
using human pancreas Marathon-Ready™ cDNA (Clontech). Vector-primer 
amplification of a heart 5'-stretch cDNA library (Clontech) identified a clone having 65 

25 nucleotides upstream of the putative ATG codon. Efforts to obtain additional 
5 '-untranslated sequence were unsuccessful possibly because of the GC-rich character of 
the sequence. The sequence of mouse calpain 10 cDNA was obtained in a similar manner 
including 5'-RACE using liver, heart and skeletal muscle RNA. The sequence of human 
GPR35 cDNA was obtained as described for human calpain 10 including 5'- and 

30 3'-RACE. 
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Identification ofSNPs 

SNPs were identified by resequencing ESTs, STSs and a 50 kb segment in ten 
affected individuals, eight from families in which NIDDM1 was likely to be segregating 
5 and two from families in which variation at NIDDM1 was unlikely to contribute to the 
development of type 2 diabetes. Only ten subjects were examined because the inventors 
were primarily interested in identifying SNPs with relatively high frequency. Once a SNP 
was identified, it was typed by direct sequencing or PCR™-RFLP in 100 additional 
patients thus giving the inventors information on one affected subject from each of 110 
10 families from the original genome- wide screen (including the 10 individuals used for the 
original identification of the SNP) and in 1 12 randomly ascertained Mexican American 
controls. All patients and controls were from Starr County, Texas, and surrounding area. 

Association studies 

15 The strategy to identify linkage disequilibrium within the NIDDM1 region was 

based on the comparison of allele and haplotype frequencies at SNPs in 1 10 patients and 
112 random controls. Initial analyses were conducted on allele frequencies, using a 
chi-square test to assess the significance of differences between patients and controls. 
Additional analyses were conducted by comparing the estimates of frequencies of 

20 haplotypes composed of successive SNPs in patients and controls. 

A likelihood ratio test was calculated with significance assessed through 
simulation studies using random permutation of genotypes within the patient and control 
groups. The rationale for considering haplotype frequency differences was that the 
25 inventors might be able to detect linkage disequilibrium across a region defined by 
successive SNPs even if they could not detect linkage disequilibrium via the association 
tests at the individual SNPs. Since the 1 10 patients were from the original families used 
in the genome-wide screen for type 2 diabetes genes, the inventors had considerable 
additional information on the probability that these individuals actually derived 
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susceptibility from NIDDM1, thereby allowing the inventors to conduct analyses to 
confirm the consistency of any findings. 

Positive findings were followed up by making further comparisons between the 
5 control group and 1) a subset of patients from families providing strong evidence for 
linkage to NIDDM1 (NPL>0.7, n=37) and 2) a smaller subset of patients from families 
with strong evidence for linkage to the NIDDM1 region of chromosome 2 and the CYP19 
region of chromosome 15 (NPL>0.7 at both, n=20). The inventors had the expectation 
that any allele or haplotype frequency differences between the overall patient group and 
10 controls that reflected actual linkage disequilibrium between the haplotype or allele and 
NIDDM1 should be even stronger in comparisons of the controls with subsets of patients 
most likely to have the variation at NIDDML This strategy was designed to maximize 
the ability to detect disequilibrium by looking for it at both the level of the allele and the 
haplotype, but minimize the potential for misleading false positive results. 

15 

Linkage studies - 110 families 

Once UCSNP-43 was identified through the association studies as a possible 
candidate for being NIDDM1, the inventors examined the evidence for linkage (using all 
chromosome 2 markers genotyped for the original genome scan) in subsets of the data 
20 defined on the basis of the genotype at UCSNP-43 in the single member of the 110 
families in which an individual was typed. 

Linkage analyses were conducted using a version of GENEHUNTER (Kruglyak et 
al. 9 1996) modified to allow assessment of the evidence for linkage that is not 
25 conservative in the presence of missing data Kong and Cox, 1997), and all analyses were 
conducted using the S(pairs) scoring function. These analyses were facilitated by 
development of a recent extension that allows weights for families to be specified. Thus, 
families in which no member was typed were assigned weight 0 and similarly, families in 
which the single typed member had a non-associated genotype were assigned weight 0, 
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while families in which the single typed individual had an associated genotype were 
assigned weight 1. 

The primary calculations are done but once, and then alternative weighting files 
5 can be used to determine the evidence for linkage in any subset of the data defined on the 
basis of SNP genotypes in the 110 individuals routinely typed. The inventors compared 
results of linkage analyses in subsets of the families defined on the basis of genotypes in 
the 110 typed individuals at UCSNP-43 with results of linkage analyses in subset of the 
families defined on the basis of genotypes at the other SNPs. Dominant (1,1+1,2 vs 2,2) 
10 and recessive (1,1 vs 1,2+2,2) models were considered for each SNP. 

Linkage studies - all families 

Linkage analyses for SNPs typed in all members of the 170 sibships were carried 
out after first constructing data sets reflecting dominant and recessive transmission of the 

15 associated allele. The use of a pair-based scoring function allows the inventors to 
calculate the evidence for linkage in completely non-overlapping sets of affected sib pairs 
that are defined on the basis of their genotypes at SNPs. For each model (dominant and 
recessive), two data sets were constructed, each of which contained the full genotypic 
information at chromosome 2 markers used to determine the probability distribution of 

20 the complete inheritance vector for each family included in that data set but which 
included completely non-overlapping sets of affected sib pairs. For example, if allele 1 is 
the associated allele, the recessive data set for allele 1 was comprised of all families with 
at least two sibs with the 1,1 genotype. All individuals within these families were 
included in analyses for obtaining information on the complete inheritance vectors, but 

25 only those individuals who were 1,1 homozygotes were considered as affected and 
therefore only those individuals contributed to the assessment of the evidence for linkage. 
A complementary group of sibships was constructed from all sibships not included in the 
"associated" group. In addition, sibships in which at least two but not all members had 
the at-risk genotype were included in this group. 

30 
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In constructing the complementary set of affected sib pairs, it is sometimes 
necessary to duplicate sibships in order to obtain the necessary complementarity without 
sacrificing any of the information. Thus, when a sibship contains multiple sibs with and 
multiple sibs without the associated genotypes, it is duplicated (number of duplicates = 
5 number of sibs with associated genotypes), and affection status of sibs are adjusted so that 
no pair is included more than once but all pairs are present in one or the other of the 
complementary data sets. 

EXAMPLE 2 

10 Physical Mapping of NIDDM1 

Initial linkage studies localized NIDDM1 to the distal long arm of chromosome 2 
near D2S125. Further genotyping and refinement of the genetic map placed NIDDM1 
near D25140 at 263.56 cM in the genetic map (Broman et a/., 1998) with a 1-lod 
confidence interval from 257-269 cM, a 12 cM interval which included D25125 (260.63 
15 cM). Although the 1-lod confidence interval for NIDDM1 was quite large and thus made 
the identification of NIDDM1 a rather formidable task, subsequent genetic studies 
identified a region on chromosome 15 near CYP19 which interacts with NIDDM1 to 
affect susceptibility to type 2 diabetes. 

20 Taking the evidence for linkage at chromosome 15 into account in linkage 

analyses on the NIDDM1 region of chromosome 2 increased the lod score from 4.0 to 7.3 
and decreased the 1-lod confidence interval from 12 to 7 cM (Le. from 259-266 cM). The 
inventors focused the inventors' search for NIDDM1 in the 7 cM interval from 259-266 
cM, knowing that the inventors might have to extend the inventors' search if the 

25 inventors did not find the variation responsible for NIDDM1 in this region. 

A combined YAC, BAC and PAC contig (FIG. 2) centered on D2S140 was 
generated using information in public databases and by screening YAC, BAC and PAC 
libraries with markers from the Genethon human genetic linkage map and STSs for 
30 known genes and ESTs that had been localized to the region of NIDDM1. Additional 
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STSs were generated from the ends of PAC and BAC clones and by random sequencing 
of fragments of these clones. This contig was defined by 73 STSs and spanned a region 
of about 1.7 Mb. It included the 5.1 cM interval between D2S2285 and D2S140 (258.49- 
263.56 cM) and a smaller interval of unknown genetic size telomeric to D2S140 (FIG. 2). 
5 Thus, this contig may encompass most if not all of the region defined by the 1-lod 
confidence interval (259-266 cM) based on the interaction between NIDDM1 and the 
locus on chromosome 15 near CYP19. Fluorescent in situ hybridization with PAC 179G9 
placed this contig in chromosome band 2q37.3, the most distal band of the long arm of 
chromosome 2. 

10 

Comparison of the sizes of the genetic and physical maps indicated that the 
NIDDM1 region was characterized by higher than average recombination so that 1 cM 
corresponded to about 240 kb, a result consistent the telomeric location. This result was 
advantageous in that it reduced the size of the interval over which the search for NIDDM1 
15 would be conducted. However, it also represented a disadvantage in that the levels of 
linkage disequilibrium would likely be decreased over this region. 

The physical map enabled the inventors to begin a systematic search for NIDDML 
The inventors focused the inventors' attention initially on the expressed sequences 

20 localized in the physical map in this region identified during the course of assembling the 
contig. These included several known genes (GPC1, ATSV, AGXT, HDLBP, NEDD5, 
sds22-like, serine/threonine kinase-likej, none of which were obvious candidates, and 15 
ESTs (FIG. 2). SNPs were identified in these expressed sequences by resequencing STSs 
in a panel of ten unrelated diabetic subjects, eight of whom were selected because they 

25 were members of sibships in which NIDDM1 was likely to be segregating and two were 
from sibships in which variation at NIDDM1 was unlikely to contribute to the 
development of type 2 diabetes. Only ten subjects were examined because the inventors 
were primarily interested in identifying SNPs with relatively high frequency. Once a SNP 
was identified, it was typed by direct sequencing or PCR-RFLP in 100 additional patients 

30 thus giving the inventors information on one affected subject from each of 1 10 families 
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from the original genome- wide screen (including the 10 individuals used for the original 
identification of the SNP) and in 1 12 random controls. 

EXAMPLE 3 

5 Identification of NIDDM1 

Allele and haplotype frequencies were compared among controls (n=l 12), patients 
(patients all, n=110), the subgroup of patients from families most likely to have 
susceptibility at NIDDM1 (patients NIDDM1, n=37) and subsequently with a smaller 
subgroup from families most likely to have susceptibility at NIDDM1 and the interacting 

10 locus near CYP19 on chromosome 15 (patients NIDDM1/CYP19, n=20), once this 
interaction became evident. The expectation was that the degree of association would 
increase as the inventors examined those patients with type 2 diabetes due to variation at 
NIDDM1 or to the interaction between NIDDM1 and the unknown diabetes susceptibility 
locus on chromosome 15. The inventors began the search for NIDDM1 by first typing 

15 SNPs that the inventors had identified in known genes and ESTs and comparing the 
frequencies of alleles and estimated haplotypes formed between adjacent markers (most 
haplotypes were between two adjacent markers although occasionally three were 
examined if the STS contained multiple SNPs) (FIG. 2, Table 5). 

20 The results of these comparisons were used to focus the inventors' search 

including the identification of new SNPs found by shotgun sequencing of fragments of 
the BAC and PAC clones in ten unrelated diabetic subjects. 

The control and various patient groups did not differ in allele frequencies at any of 
25 the first 15 SNPs examined (UCSNP-l-to-15, FIG. 2; Table 5). However, comparison of 
estimated haplotype frequencies between controls and patients (patients all) revealed a 
significant difference (P<0.05) in the frequencies of haplotypes comprised of the three 
markers UCSNP-1, - 2 and -15. Moreover, the haplotype frequencies estimated from the 
subset of patients from families with evidence for linkage at NIDDM1 were also 
30 significantly different (P<0.01) from those estimated in controls even though the sample 
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size was reduced by 50%. Therefore, the region between UCSNP-15 and UCSNP- 1 -to- 
4, a region of about 250 kb, became the primary focus of the inventors' search although 
the inventors also continued to type SNPs outside this region in order to reinforce the 
inventors' conclusions with regard to the location of NIDDML The inventors observed 
significant allele frequency differences between patients and controls at UCSNP- 18 
(P=0.019) which was distal to the inventors' primary region of interest; however, allele 
frequencies did not differ between controls and either subgroup of patients suggesting 
that NIDDMl was unlikely to be in the region of this marker (Table 5). 
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A cluster of four SNPs in the interval between UCSNP-15 and UCSNP-l-to-4 
showed a significant difference in allele frequencies between patients and controls: 
UCSNP-26, P=0.020; UCSNP-25, P=0.034; UCSNP-23, P=0.020; and UCSNP-22, 
P=0.0129 (Table 5). As expected because of their proximity to one another, there was 
5 strong linkage disequilibrium between these four SNPs. The associated alleles were also 
present at higher frequency on the haplotypes that lead the inventors to focus on this 
region in the first instance. The consistency of the findings led the inventors to focus the 
inventors' attention on the region around UCSNP-22-to-26 and new SNPs flanking this 
cluster. The results of this continuing search suggested that NIDDM1 was in the interval 
10 between UNSNP-20 and the cluster of SNPs, UCSNP-22-to-26. 

At UCSNP-43, the inventors observed a striking increase in the frequency of the 
common allele in the patient and patient subgroups compared to controls (Table 5, 
FIG. 3). The increasing frequency of the associated allele at UCSNP-43 from 0.73 in 

15 controls to 0.95 in the paiicnt-NIDDMl/CYP19 subgroup raised the possibility that 
NIDDM1 was transmitted as a high frequency recessive. The inventors therefore 
examined the evidence for linkage in the subgroups of patients defined by SNP genotypes 
in the single typed individual. UCSNP-43 generated a lod score of 4.15 in just 67 of 1 10 
sibships in which the single typed patient was homozygous for the common allele. Thus, 

20 UCSNP-43 was associated with type 2 diabetes, provided disproportionate evidence for 
linkage in the families of patients homozygous for the common allele and was the first 
marker to show compelling evidence for both (Table 5, FIG. 3). UCSNP-43 was then 
typed in all members of the 170 sibships comprising the primary affected sibpair dataset. 
Sibships in which all sibs were homozygotes for the common "G" allele accounted for 

25 49% of all sibships and the affected sibpairs from these families accounted for 45% of the 
330 affected sib pairs. The multipoint lod score in these families was 10.19. The 
multipoint lod score in the complement of the data (51% of families and 55% of affected 
sib pairs) was 0 across the entire 2qter region. Thus, all the evidence for linkage between 
type 2 diabetes and the NIDDM1 region can be accounted for by homozygosity of the G- 
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allele at UCSNP-43. UCSNP-43 is the prime candidate for being the variation 
responsible for NIDDML 

In order to be certain that there were no other SNPs that might provide 
5 comparable or even stronger evidence for being NIDDM1, and to be sure that no other 
variants in the gene containing UCSNP-43 might be alternative NIDDM1 susceptibility 
alleles, a 50 kb region around this SNP was resequenced in ten patients to identify all the 
variation in this region. All high frequency SNPs, i.e. allele frequencies between 0.25- 
0.75, not in complete linkage disequilibrium with a previously typed SNP (the inventors 
10 found that SNPs with perfect genotypic correspondence in the ten unrelated patients were 
invariably in strong linkage disequilibrium with each other) were then typed in at least the 
110 patients for comparison with the results obtained with UCSNP-43. In addition, all 
members of the 170 sibships comprising the primary affected sibpair dataset were typed 
at SNPs selected for their proximity and strong linkage disequilibrium to UCSNP-43, 
15 association with type 2 diabetes or disproportionate evidence for linkage (Table 5). Of 
the 60 SNPs examined, only UCSNP-43 can adequately account for the linkage of this 
region with type 2 diabetes. 

Some of the polymorphisms studied exhibited a stronger baseline association with 
20 type 2 diabetes in the comparison of allele frequencies between cases and controls than 
does UCSNP-43. However, many of the associations (e.g., UCSNP-38 and -39) become 
weaker rather than stronger as the inventors consider the subgroups of patients most 
likely to come from families segregating for NIDDM1, and the evidence for linkage in 
subsets of families defined on the basis of genotypes (in the single typed member of the 
25 family) at these loci is largely proportional to the number of families in the subset. In 
contrast, the allelic associations at UCSNP-43 became stronger when examined in 
subgroups of patients most likely to come from families segregating for NIDDML 
Moreover, the evidence for linkage in subsets of families defined by SNP genotype also 
increased and this increase was disproportionate to the number of families in the subset of 
30 the data examined. 
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Fourteen of the 60 SNPs examined (Table 5) showed nominal evidence for 
association with type 2 diabetes {i.e. P<0.05) in comparisons of control and patient-all 
groups. In fact, some were more than 40 kb from NIDDM1/UCSNP-43 which itself did 
5 not show evidence for association in a direct comparison of controls and patients-all but 
was associated in the patient-MDDMi and -NIDDM1/CYP19 subgroups. The failure to 
achieve statistical significance in the patient-all group is due to the high frequency of the 
associated allele and the relatively small sample size. Thus consideration of only the 
association data between controls and the patient-all groups would not have provided the 
10 identity of NIDDM1. The analyses that addressed the evidence for linkage enabled the 
inventors to distinguish which polymorphism was NIDDML 

In addition to typing UCSNP-43 in the primary set of 170 sibships (330 possible 
affected sibpairs) used in the genome-wide screen for type 2 diabetes genes, the inventors 
15 also typed it in a second smaller group of 76 sibships (110 affected sibpairs) that also 
provided evidence for linkage with markers near NIDDML Homozygosity for the 
common G-allele at UCSNP-43 can account for all of the evidence for linkage originally 
reported in this sample as well (Table 5). 

20 EXAMPLE 4 

NIDDM1 is a Novel Calpain-like Protease 

The analysis of the sequence of the 49,136 bp region (SEQ ID NO:l) around 
UCSNP-43 revealed two genes, one encoding a novel calpain-like cysteine protease, 
designated calpain 10 (gene symbol CAPN10) (Saido et al 9 1994; Carafoli and Molinari, 
25 1998; Dear et al % 1997) part of which was homologous to the ESTs yg33dl0, nf61dl2 
and yb22d04, and the second, a recently described G-protein coupled receptor GPR35 
(O'Dowd et a/., 1998), most similar in sequence to the P2Y-family of ATP receptors. No 
other excellent or good potential coding regions were predicted using Grail 2. The entire 
49,136 bp region is found as SEQ ID NO:l. 

30 
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RNA blotting studies showed that calpain 10 mRNA was ubiquitously expressed 
and the major 2.7 kb transcript could be readily detected in all human adult and fetal 
tissues examined (FIG. 4). The isolation and characterization of human calpain cDNAs 
gave a composite sequence of 2,620 nucleotides excluding polyA tract and including 177 

5 nucleotides of putative 5'-untranslated region. This sequence is shown as SEQ ID NO:3. 
This sequence contains an ORF that encodes a protein of 672 amino acids (SEQ ID 
NO:2) related in structural organization and sequence with members of the calpain large 
subunit superfamily (FIG. 5). This ORF begins with the second ATG codon (both ATG 
codons are in an adequate context to be start sites for translation) (Kozak et al 9 1996) and 

10 is not preceded by an in-frame stop codon. 

Conceptual translation beginning at the first ATG predicts the sequence of a 
protein of 65 amino acids that is unrelated to any in the GenBank data base. Since 
translation usually begins with the first ATG codon (Kozak et al> 1996), this result 

15 suggests that the human calpain 10 cDNA may lack the authentic initiator codon. Using 
5'-RACE and other strategies, the inventors were unable to obtain additional 
5'-untranslated sequence. Thus, in order to confirm this ORF, the inventors isolated 
cDNA clones encoding the mouse orthologue since they expected the homology between 
the human and mouse sequences to be well conserved in the protein coding region and 

20 more divergent in the 5'- and 3'-untranslated regions. The 2,511 bp composite mouse 
calpain 10 cDNA (SEQ ID NO: 19) encoded a protein of 666 amino acids (SEQ ID 
NO: 18) having 81.7% identity with the human protein. There is 83.4% identity between 
the sequences of the predicted coding regions of the mouse and human cDNAs and the 
homology dissipates outside of these regions. The longest ORF in mouse calpain 10 

25 mRNA begins with the third ATG codon which is preceded upstream by an in-frame stop 
codon. The first and second ATG are in the same frame and are preceded by a in-frame 
stop codon. There are stop codons in all three reading frames in the 109 bp upstream of 
the putative start of translation. The sequence around the first ATG codon is highly 
divergent between human and mouse and becomes more similar in the region of the 

30 second out-of-frame ATG codon. The inventors infer from these results that translation is 
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initiated at the second ATG codon in the human sequence and at the third in the mouse. 
The implications of the presence of upstream ATG codons for the regulation of 
expression of calpain 10 are unknown. 

The human CAPN10 consists of 15 exons spanning 32 kb (FIG. 3). The analysis 
of human cDNA clones revealed a complex pattern of alternative splicing generating in 
addition to the protein of 672 amino acids described above (SEQ ID NO:2), proteins of 
544 (SEQ ID NO:4), 517 (SEQ ID NO:6), 513 (SEQ ID NO:8), 444 (SEQ ID NO:10), 
274 (SEQ ID NO: 12), 139 (SEQ ID NO: 14) and 138 amino acids (SEQ ID NO: 16), 
designated calpain 10a to lOh (FIG. 1). RT-PCR™ studies suggest that transcripts 
encoding calpain 10a are the most abundant in the various tissues examined. Calpain 
10b, 10c and lOg were readily detectable in many tissues including skeletal muscle and 
islets, and calpain lOh was present at moderate levels only in islets of the tissues tested. 
The other forms, calpain lOd to lOf are much less abundant. Studies of mouse calpain 10 
expression showed a 2.7 kb transcript that could be detected in all tissues examined. 
Thus, calpain 10 appears to be ubiquitously expressed in both mouse and human tissues. 

The nucleotide variant showing all the evidence for linkage with type 2 diabetes, 
UCSNP-43, is located in intron 3 of CAPN10 (FIG. 4) 746 bp downstream of the splice 
donor site and 176 bp upstream of the splice acceptor site. The molecular mechanism by 
which the G-to-A polymorphism at UCSNP-43 affects susceptibility to type 2 diabetes is 
unclear. As shown in FIG. 5, there is alternative splicing of intron 3. However, the 
inventors' RT-PCR™ studies suggest that this is an relatively rare event and it remains to 
be determined whether it is influenced by the polymorphism at UCSNP-43. The 
inventors have also considered the possibility that there is a gene embedded within this 
intron. Translation of intron 3 in all frames revealed a small ORF in the reverse strand 
that could encode a protein of 95 amino acids. This protein has no homology with any in 
the GenBank database and the variant at UCSNP-43 would be a silent mutation. In 
addition, this ORF is not conserved in the sequence of intron 3 of the mouse gene 
strongly suggesting that it is not an exon. 
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There are only three polymorphisms in exons of the CAPN10: exon 11, a silent 
substitution in codon 620 (UCSNP-48, Table 5); and exon 13, a nucleotide substitution 
resulting in a Val-to-fle change in codon 666 (q(l)=0.98) (UCSNP-58); and a 
5 polymorphism in the 3'-untranslated region. None of these can account for the evidence 
of linkage of this region with type 2 diabetes. 

In addition to CAPN10, the NIDDM1 interval included the gene encoding a 
recently identified member of the G-protein coupled receptor superfamily, GPR35 

10 (O'Dowd et al. 9 1998). The sequence of GPR35 is most similar to that of a putative 
purinoceptor P2Y 9 (34.4% identity) suggesting that ATP or other nucleotide may be its 
ligand. Hybridization to a RNA Master Blots showed low levels of GPR35 mRNA in all 
adult and fetal tissues with relatively higher levels in adult lung, small intestine, colon 
and stomach. In these tissues, there are two major transcripts of 2.4 and 4.4 kb whereas 

15 in skeletal muscle there is a single transcript of 9.4 kb. The composite cDNA is 1,875 bp 
(exclusive of polyA tract, SEQ ID NO:21) and may lack about 400 bp of the 
5-untranslated region. It encodes a protein of 309 amino acids (SEQ ID NO:20) having 
all the features of a G-protein coupled receptor including seven membrane-spanning 
segments. Translation is predicted to begin at the third ATG codon which is preceded by 

20 an in-frame stop codon (the two upstream ATG codons which are in the same reading 
frame and closely followed by a stop codon are in a poor context to serve as translational 
start codons). The putative initiation codon is also in a poor context for initiation and 
translation may start at codon 14 which is in a strong context. The GPR35 cDNA and 
gene sequences are colinear suggesting that GPR35 gene consists of a single exon. The 

25 sequence of the GPR35 gene is also highly polymorphic with six nucleotide substitutions 
associated with amino acid polymorphisms (including UCSNP-38 and -53), three silent 
substitutions and three and two polymorphisms in the 5'- (UCSNP-49, -50 and -51) and 
3'-untranslated (UCSNP-40) regions of the mRNA, respectively. While there is 
association of several of these polymorphisms with type 2 diabetes in Mexican 

30 Americans, they cannot account for the evidence of linkage (Table 6). 
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EXAMPLE 5 

Improved Localization of NIDDM1 by Linkage Analyses 

A previous genome-wide screen for type 2 diabetes genes in Mexican Americans 
5 localized a major susceptibility gene, NIDDM1, to the D2S125-D2S140 region of 
chromosome 2 (Hanis et a/., 1996) (multipoint lod score = 4.03). This was the only 
region in the primary analyses to meet genome-wide criteria for significance. Animal 
studies have suggested that type 2 diabetes may result, at least in part, from epistatic 
interactions between genes (Terauchi et ai 9 1997; Brunning et al, 1997). In addition, 
10 some alleles at genes associated with monogenic forms of diabetes such as maturity onset 
diabetes of the young (MODY, a genetically heterogeneous form of diabetes 
characterized by autosomal dominant inheritance, onset usually before age 25 and 
pancreatic 0-cell dysfunction) may cause a form of diabetes that resembles type 2 
diabetes (Mahtani etal, 1996; Iwasaki et al\ 1997). 

15 

The inventors examined the evidence for statistical interactions between NIDDM1 
and the ten other autosomal regions providing nominal evidence for linkage (p<0.05) in 
the study by Hanis et al (1996) as well as five regions containing genes assorted ninth 
MODY (Table 6). Two regions, CYP19 on chromosome 15, and the hepatocyte nuclear 
20 factor (HNF)-lcc/MODY3 gene on chromosome 12, showed significant correlations 
between their NPL scores and NPL scores at NIDDM1 even after Bonferroni correction 
for the number of correlations examined. The methods and results related to these studies 
are described in further detail below. 
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Table 6. Correlations between NPL Scores at NIDDM1 and Autosomal Regions 
Nominally Significant in Genome-Wide Screen of type 2 Diabetes and Five Loci 
Associated with MODY 



Region 


Correction 


Corrected 
P-value 


Baseline LOD 


NIDDMl- 
WeightedLOD 


CYP19 


0.288 


2.1 x 10' 3 


1.27 


4.00 (Weignto-i) 


D7S502 


0.180 


0.29 


0.76 


1.31 (Weighto-i) 


D3S3054 


0.098 


ns 


0.8 1 


0.42 (Weighto-0 


D2S377 


0.085 


ns 


1.28 


1.50(Weighto-i) 


D15S104 


0.066 


ns 


0.93 


1.20(Weighto.O 


D3S2452 


0.031 


ns 


1.24 


0.81 (Weighto-i) 


D2S441 


0.027 


ns 


0.78 


0.50 (Weighto,i) 


D12S379 


-0.012 


ns 


0.68 


0.30 (Weignti_o) 


D11S1314 


-0.059 


ns 


0. /o 


ATI (XMa\ rrVkf \ 

U./i (Weignto-i) 


L)l /M-Zyo 




n 

V.J7 


yj, i j 


1 21 fWeighti ^ 


GCK 


0.124 


ns 


o.oi 


0.26 (Weighto.i) 


HNF-la 


-0.228 


0.04 


o.oi 


1.03 (Weighty) 


HNF-lp 


0.010 


ns 


0.00 


0.00 (Weighty) 


HNF-4a 


0.003 


ns 


0.38 


0.35 (Weighti.oO 


IPF1 


-0.187 


0.24 


0.32 


1.11 (Weighty) 



5 *P-values corrected by multiplying the nominal P-value by the number of 

correlations examined (15), and numerical values are given only for those 
loci in which the uncorrected P-values were nominally significant (P<0.05). 
The marker used for HNF-la was GATA32A10, for HNF-lp was 
D17S1788, for HNF-4a was ADA, and for IPF1 was D13S221. 

10 

Methods 

Genome scan data on 524 autosomal markers genotyped in 424 individuals from 
170 Mexican American sibships originally described in Hanis et al (1996) were used for 
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the analyses described here. A region near D2S140 provided strong evidence for linkage 
to type 2 diabetes in Mexican Americans (NIDDM1, lod = 4.03, P<% x 10" 6 ) NPL scores 
from this region were used in calculating correlations with each of the other ten 
autosomal regions providing nominally significant (P<0.05, MLS > 0.59) evidence for 
5 linkage. Correlations were also calculated between the NPL scores at NIDDM1 and five 
regions from which MODY genes have been characterized (GCK (Frogel et al, 1993), 
HNF-la (Yamagata, 1996a), HNF-ip (Horikawa et a/., 1997), HNF-4a (Yamagata, 
1996b) and IPF1 (Stoffers etaL, 1997). 

10 Analyses in which the evidence for linkage at NIDDM1 was used to weight the 

contribution from families in linkage analyses on these 15 regions were also conducted. 
In the weighto-i, family weighting, families were assigned weight 0 if their NPL score at 
NIDDM1, (D2S140, the location providing the strongest evidence for linkage in the 
NIDDM1 region) was 0 or negative and weight 1 if their NPL score at NIDDM1 was 

15 positive. In the weighty, family weighting, families were assigned weight 1 if their NPL 
score at NIDDM1 was negative and weight 0 if their NPL score at NIDDM1 was 0 or 
positive. In the weight PR op family weighting, the weight for families with positive NPL 
scores was calculated as NPL/NPLmax where NPLmax was the maximum NPL score 
observed in any family, and the weight for families with negative NPL scores was 0. 

20 

Simulation studies were used to assess the significance of the increase in lod score 
at CYP19 using the weighto-i family weighting with respect to the evidence for linkage at 
NIDDML At D2S140 there were 95 families with positive NPL scores and 75 families 
with 0 or negative NPL scores. Simulations based on the weighto_i, or weighty family 

25 weighing can be rapidly conducted using the extension which allows families to be 
weighted individually. The basic GENEHUNTER analysis need be conducted only once 
on the actual data (in this case, from chromosome 15), and then many replicate weighting 
files can be generated randomly (in this example, 95 randomly chosen families are given 
weight and the remaining 75 families are given weight 0) and used to calculate the final 

30 lod scores. 
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The software described in this manuscript is distributed as GENEHUNTER-PLUS 
(version 2.0 or later) and is available via anonymous ftp at galton.uchicago.edu on the 
/pub/kong directory. The allele-sharing method which is used is described in Kong and 
5 Cox (1997) and version 2.0 introduces an option to provide a family-specific weight in 
the lod score computation. 

Results 

The lod in the CYP19 region was 1.3 in baseline analysis but increased to 4.0 

10 when the families were weighted by their evidence for linkage at NIDDM1 using 
weighto-i, and to 4.1 when families were weighted by their evidence for linkage using 
weighing FIG. 7 A). Note that the more distal region of chromosome 15 with similar 
baseline evidence for linkage does not show a comparable increase in lod when the 
evidence for linkage at NIDDM1 is taken into account. However, the lod score at 

15 NIDDM1 rises from 4.0 in the baseline analyses to 5.6 when families were weighted by 
their evidence for linkage at CYP19 using weighty asset to 7.3 using weight PRO p 
(FIG. 7B). In simulations conducted to determine the significance of the increase in the 
lod at CYP19 from 1.3 to >4.0, the inventors found that none of 10,000 replicates from a 
simulation in which 95 families (the number of families in these data and positive NPL 

20 scores at NIDDM1) were randomly chosen and analyzed for the actual 15 data had a lod 
score as large as 4.0, although 4 (of 10.000) yielded lods between 3.5 and 4.0. Thus, a 
reasonable estimate of the nominal significance of the increase in lod from 1.3 to 4.0 
based on simulation is 0.0001), or 0.0015 corrected for the number of regions examined. 
The conservative % 2 test described above would be calculated as 2 

25 log(10)(4.0-1.3) = 12.4, giving a P-value of 0.0004. The P-value obtained in this way is 
indeed comparable to the P-values obtained from the correlation test and the simulations, 
but is more modest (conservative) because the inventors have not actually maximized the 
evidence for linkage over a family-specific weights (for example, the lod score for 
weight PRO pis4.1). 

30 
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The CYP19 region of chromosome 15 was the only location besides NIDDM1 to 
be replicated (P<0.05) in a smaller, independent sample of Mexican American families 
(Hanis et ai, 1996). This, as well as the evidence for statistical interaction between these 
regions, suggests that in collections of Mexican American families similar in size to that 

5 in the original genome scan, the evidence for linkage in analyses of chromosome 15 
might sometimes be more prominent than that for NIDDM1, and that in many such 
collections, the signals from both regions might be comparable and only modest unless 
the interaction is properly taken into account. Thus, it is possible that some of the 
difficulties recognized in replicating results obtained in genome scans for complex 

10 disorders (Suarez et al, 1994) might be alleviated by conducting analyses to identify 
potential interactions. Finally, the improvement in localization offered by linkage 
analyses which allow for the contributions of multiple susceptibility loci may be critical 
to the successful positional cloning of genes for complex disorders. 



15 EXAMPLE 6 

The Presence oiNIDDMl Is Associated with Increased Risk for 
the Development of Type 2 Diabetes in a Predisposed Population 

In order to determine whether evidence that the presence of NIDDM1 is associated 
20 with increased risk for the development of type 2 diabetes in a predisposed population 
could be detected, 106 Mexican American subjects from Starr County, Texas, were 
selected, each of whom had at least two first degree relatives with type 2 diabetes but 
none of whom had a personal history of previously diagnosed diabetes. 



25 Each subject underwent a standard oral glucose tolerance test. This is a standard 

test used to measure the response of islet cells to a glucose bolus and is currently 
recognized as the test in most wide-spread use for diabetes detection. After an overnight 
fast, blood samples for the measurement of glucose and insulin were obtained before (-15 
min and 0 min) and after (30, 60, 90 and 120 min) the ingestion of 75 g glucose orally. 

30 The subjects were classified into two groups. The first was homozygous for the G allele 
at UCSNP-43 (GG n=57) and the second was either homozygous for the A-allele (AA, 
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n=15) or heterozygous (GA, n=34) at UCSNP-43 (combined AA/GA n=49). The results 
of this study are shown in Table 7 below which depicts average glucose and insulin 
concentrations in both groups of subjects before and after glucose ingestion. 

5 Table 7. Average glucose and insulin concentrations in homozygous and heterozygous 
individuals 





Genotype 


-15 mins. 


0 min. 


30 min. 


60 min. 


90 min. 


120 min. 


Glucose 
(mg/dl) 


GG 


103 


103 


181 


193 


175 


147 


Glucose 
(mg/dl) 


AA/GA 


101 


101 


180 


187 


160 


133 


Insulin 
(UU/ml) 


GG 


15.8 


17.2 


97.8 


144.9 


138.3 


120.9 


Insulin 
(uU/ml) 


AA/GA 


16.4 


17 


123.6 


157.2 


130.5 


108.9 



Fasting glucose concentrations were within the normal range (<1 10 mg/dl) in both 
groups. Following glucose ingestion glucose concentrations increased as expected. In 
10 the AA/GA subjects, the average glucose concentration had fallen to below 140 mg/dl by 
120 min. This is the threshold value that defines normal glucose tolerance. However, in 
the GG subjects, glucose concentrations remained elevated, and at 120 min had fallen to 
only 147 mg/dl a level defined as impaired glucose tolerance by WHO criteria. 

15 Insulin concentrations were elevated in both groups after the overnight fast, Le. 9 at 

-15 and 0 min. In normal insulin sensitive individuals the fasting insulin concentration is 
usually around 7 uU/ml and rarely exceeds 10 ^U/ml. The presence of fasting 
hyperinsulinemia suggests the presence of insulin resistance. After glucose ingestion, 
there was a rapid increase in insulin levels in the AA/GA subjects, and this brisk insulin 

20 secretory response is presumably responsible for the normal response in glucose 
concentrations. In the GG subjects however the insulin secretory response to glucose 
ingestion is significantly reduced at 30 min. Thus, at 30 min after glucose ingestion, the 
increment in insulin levels over baseline values in the subjects with the GG genotype was 
significantly lower than in the subjects with the AA/GA genotype (82.0 vs. 107.3 uU/ml, 
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P<0.043). At 90 and 120 min, insulin concentrations were higher in the subjects with the 
GG genotype, presumably as a response to the continued elevation in plasma glucose 
concentrations. 

5 Thus, Mexican American subjects possessing a family history of diabetes who do 

not have diabetes themselves but who are homozygous GG at UCSNP-43 demonstrate a 
number of abnormalities on oral glucose tolerance testing. First, these individuals 
demonstrate fasting hyperinsulinemia suggesting the presence of insulin resistance. 
Second, these individuals have elevated average plasma glucose concentrations 120 min 

10 after ingestion of 75 g glucose orally to within a range that defines impaired glucose 
tolerance a condition widely recognized to be associated with a significant increased risk 
for the subsequent development of type 2 diabetes. Further, these individuals 
characteristically have reduced insulin concentrations 30 min after ingestion of 75 g 
glucose. Reduced insulin concentrations in response to the oral ingestion of nutrients is 

15 one of the hallmarks of type 2 diabetes. A similar defect is therefore present in subjects 
homozygous GG at UCSNP-43 even before the onset of diabetes. 

The G-allele at UCSNP-43 has a frequency of 0.75 in Mexican Americans, 0.71 in 
non-Hispanic whites of German ancestry, 0.90 in African Americans and 0.94 in Asians 

20 (Japanese). Its high frequency in African Americans and Asians implies that 81% and 
88%, respectively, of the nondiabetic subjects in these two populations have the at-risk 
genotype at UCSNP-43 and are thus at increased risk of diabetes due to variation at this 
locus. This may account, at least in part, for the higher frequency of type 2 diabetes in 
these populations (Diabetes in American, 2nd Edition. NIH Publication No. 95-1468, 

25 1995). 

Thus, the combination of pathophysiological defects (insulin resistance, impaired 
glucose tolerance and defective insulin secretion) in subjects who are homozygous GG at 
UCSNP-43 prior to the onset of overt type 2 diabetes provides strong supporting evidence 
30 for an important role of this gene as a primary cause of type 2 diabetes. 
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EXAMPLE 7 

Studies to Elucidate Linkage of Homozygous GG at UCSNP-43 to Type 2 
Diabetes in Additional Populations and to Determine Whether This 
5 Mutation Leads to Similar Physiological Effects in Other Populations 

The homozygous GG at UCSNP-43 is a common genotype in populations other 
than the Mexican American subjects studied above. In view of the studies above, it is 
now possible to determine whether: (1) the linkage between this genotype and type 2 
10 diabetes extends across other populations, and (2) similar physiological effects of this 
genotype are seen in other populations. Studies are underway to assess these two 
questions. 

The inventors are presently genotyping persons from populations, other than the 
15 Starr County, Texas, Mexican American population, who have relatives with type 2 
diabetes to determine whether they are homozygous GG, homozygous AA, or 
heterozygous at the relevant location in UCSNP-43. Once these geneotypes have been 
determined, appropriate subjects from each will be subjected to the glucose tolerance test 
described in Example 6 and perhaps other appropriate tests. The goals of this testing will 
20 be to allow one to determine whether the GG genotype impairs the ability of p-cells to 
increase insulin in response to glucose in these patients, whether insulin resistance and/or 
other defects of glucose metabolism are present, and whether there is a linkage found 
between this genotype and type 2 diabetes in this population. 

25 EXAMPLE 8 

Regulation of Insulin Secretion and Insulin Action by Calpains 

As demonstrated above, a substantial part of the genetic risk for diabetes in a 
Mexican American cohort is due to a common polymorphism in the intron of a gene 
encoding a novel calpain-like cysteine protease, termed calpain 10. Calpains are 
30 ubiquitously expressed cysteine proteases that are thought to act as intracellular 
processing enzymes with significant substrate specificity that allows them to regulate a 
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variety of cellular functions including intracellular signaling, proliferation and 
differentaiation (Mellgren, 1997; Carafoli and Molinari, 1998; Murray et ai, 1997; Ueda 
et ai y 1998). Although they have been implicated in the regulation of a variety of normal 
cellular functions and in the pathophysiology of various disease states (Richard et al, 
5 1995; Chen and Fernandez, 1998; Blomgren et al, 1995; Yokota et al 1995), a role for 
calpains in glucose homeostasis has not been defined. 

In this Example, the inventors show that inhibition of calpain activity with calpain 
inhibitor 2 (N-Ac-Leu-Leu-methioninal, ALLM), a cell permiable calpain inhibitor that 

10 inhibits calpains I and n, reduces insulin secretory responses to glucose and other insulin 
secretagogues in isolated mouse islets and the isolated perfused mouse pancreas. These 
effects are dose dependent and reversible, are mediated, in part, by reduced responses in 
intracellular Ca 2+ , and do not involve a reduction in glucose metabolism in the pancreatic 
islet. In contrast to calpain inhibitor 2, E-64-d, a cell permeable thiol protease inhibitor, 

15 resulted in an increase in glucose induced insulin secretion. In addition, ALLM reduced 
insulin mediated glucose transport in isolated rat muscle strips and isolated adipocytes 
and incorporation of glucose to glycogen in muscle. These results therefore document a 
previously unappreciated role for calpain sesitive pathways in mediating insulin secretion 
in the pancreatic 6 cell and insulin action in muscle and fat. Since inhibition of calpain 

20 activity can reproduce the two defects that are most characteristic of type 2 diabetes, i.e. 
insulin resistance and reduced insulin secretory responses to glucose and other 
secretagogues, these results indicate that alterations in calpain activity play an important 
role in the pathophysiology of type 2 diabetes. 

25 METHODS 

Animals. Studies were performed on islets obtained from non-fasted 9-13 wk old 
C57BL/6J mice (Jackson, Bar Harbor, ME) and adipocytes and soleus muscles isolated 
from 8-12 wk old normal Wistar rats (Harlan Sprague-Dawley, Indianapolis, IN). The 
calpain inhibitors used were ALLM (N-Ac-Leu-Leu-methioninal, Calbiochem- 
30 Novabiochem, Inc, San Diego, CA) and E-64-d (ethyl (+)-(2S,3S)-3-[(S)-3-Methyl-l-(3- 
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methylbutylcarbamoyl)butyl-carbamoyl]-2-oxiranecarboxylate, Matreya Inc., Pleasant 
Gap, PA). The calpain inhibitors were dissolved in DMSO. GLP-1 (7-36 amide) was 
from Peninsula Laboratory (Belmont, CA). 

Static incubation of isolated pancreatic islets. Isolation of mouse pancreatic 
islets was accomplished using collagenase digestion as previously described (Pontoglio et 
al % 1998). Following overnight incubation in RPMI 1640 medium (11.6 mM glucose), 
islets were exposed to varying concentrations of inhibitors in the same medium for 4 hr at 
37°C. Islets were then pre-incubated in KRB containing 2 mM glucose and similar 
concentrations of inhibitors for 60 min at 37°C Triplicate groups of 5 islets were then 
incubated in borosilicate tubes containing 1 ml of KRB with the same concentration of 
inhibitor and various insulin secretagogues for one hour in a moving water bath at 37°C. 
The reaction was stopped by placing the tubes on ice and an aliquot of the buffer was 
removed for measurement of insulin levels. Control studies, in which the incubation 
mixture contained vehicle (0.1% DMSO) only, were performed using aliquots of the 
same batch of islets 

Insulin secretion from perifused islets. Insulin secretion from perifiised islets 
was measured using a modification of a previously described protocol (Pontoglio et a/., 
1998; Sreenan et aU 1998). 

Measurement of islet [Ca 2+ L, and NAD(P)H. Islet [Ca 2+ ]i and NAD(P)H were 
measured as previously described (Pontoglio etaL, 1998; Dukes et al y 1998). 

Isolation of pancreatic p-cells. Single p-cells were obtained from isolated islets 
dispersed by gentle trituration (120 strokes through a 200 fxl pipette tip) in Ca 2+ and Mg 2+ 
-free PBS containing 10% trypsin. Cells were plated on glass coverslips and maintained 
in culture in RPMI containing 1 1.6 mM glucose for 48-96 hr. 
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Patch-clamp electrophysiology. Calcium current measurements were obtained 
in the whole-cell patch-clamp configuration. Calcium currents were activated by step 
depolarizations to either +10 or +20 mV for either 20 ms or 100 ms, from HP = -80 mV. 
All current records are corrected for leak and capacitance. The data was filtered at 2 kHz 
5 and then sampled every 100|is. Pipette resistances were 1.5 - 2.5 MCI. Series resistance 
was partially compensated (-80%) using the compensation circuit of the Axopatch-lC 
amplifier. 

Cells were incubated for 3-4 hr in RPMI at 37°C in either 0.1% DMSO (control) 
10 or 100 \xM ALLM and then transferred for a further 1-2 hr to KRB containing similar 
concentrations of DMSO or ALLM. For recording, cells were bathed in a solution 
containing (in mM): 145 NaCl, 2 KC1, 1 MgCl 2 , 2 glucose, 10 HEPES, 10 CaCl 2 , pH 7.3 
(adjusted with NaOH) and either DMSO or ALLM. After establishing the whole cell 
configuration the bath solution was exchanged for a TEA based recording medium which 
15 contained (in mM): 155 TEA-C1, 2 glucose, 10 HEPES, 10 CaCl 2 and 100 nM TTX, pH 
= 7.3 (adjusted with TEA OH) and either DMSO or ALLM. The intracellular pipette 
solution consisted of (in mM): 110 CsCl, 4 MgCl 2 , 20 HEPES, 10 EGTA, 0.35 GTP, 4 
ATP, 14 creatine phosphate, pH = 7.3 (adjusted with CsOH). 

20 Capacitance recordings. Capacitance measurements were made with the phase- 

tracking technique in which a 60 mV peak-to-peak sine wave was superimposed on a 
holding potential of -80 mV as previously described. Conductance and capacitance 
values were continuously generated and recorded. The whole-cell capacitance was 
canceled with the slow capacitance compensation; unbalancing the slow capacitance 

25 compensation by 100 fF provided the capacitance calibration signal used to calculate 
changes in membrane capacitance. The sinusoidal voltage template was interrupted to 
deliver depolarizations to a cell. Beta cells were stimulated with a train of ten step 
depolarizations to +20 mV (HP = -80 mV). Each step depolarization lasted 150 ms and 
was separated by 400 ms interpulse duration. Capacitance measurements were carried 

30 out in the perforated whole-cell configuration. The data were collected at a 500 usee 
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sampling rate and filtered at 5 kHz. Recordings with series resistance > 20 MQ were 
discarded. Series resistance compensation was applied in all recordings. 

The recording solution for the capacitance measurements contained (in mM): 130 
5 NaCl, 2 glucose, 10 Na-HEPES, 1 MgCl 2 , 2 KC1, and 5 CaCl 2 , pH 7.3 with NaOH. The 
pipette solution contained (in mM): 135 Cs-glutamate, 10 Na-HEPES, 9.5 NaCl, 0.5 
TEAC1, and 0.5 CaCl 2 , pH7.3 with CsOH. The pipettes were backfilled with an identical 
solution to which amphotericin B (final concentration of 0.5 mg/ml) was added and then 
sonicated. The amphotericin B stock solution (125 mg/ml) was kept frozen at -20°C and 
10 used for one week. ALLM pre-treatment was as described above (see patch-clamp 
electrophysiology section). All electrophysiological recordings were carried out at room 
temperature (22-24°C) 

Measurement of calpain activity in mouse pancreatic islets. Islets were loaded 
15 with the fluorogenic, membrane-permeant calpain substrate t-butoxycarbonyl-Leu-Met-7- 
amino-4-chloromethylcoumarin (Boc-Leu-Met-CMAC (10 |iM), Molecular Probes, 
Eugene, OR) in Hepes (10 mM) buffered KRB with 2 mM glucose, and the fluorescence 
emitted from the proteolytic product in islets was measured with a bandpass combination 
between 400 and 500 nm following excitation by light at 340 nM. Studies were 
20 performed after a 4 hr incubation in the presence either of 200 ^iM ALLM, 200 \\M E-64- 
d or vehicle. 

Glucose utilization and oxidation rates. Glucose utilization and oxidation rates 
were measured as previously described (Dukes et ai 9 1998; Zhou et a/., 1996) in mouse 
25 islets cultured as described above in the presence or absence of calpain inhibitors. 

Glycogen synthesis rates in skeletal muscle. Measurement of glycogen 
synthesis rates was performed using a modification of a previously described protocol 
(Burant et aL, 1984) in soleus muscle strips isolated from non-fasted normal Wistar rats 
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and incubated in KRB/5 mM glucose/10 mM HEPES, 0.2% BSA in the presence and 
absence of 100 jiM ALLM, 200 nM E-64-d or vehicle. 

2-Deoxyglucose uptake into skeletal muscle and adipose tissue. 2- 

5 deoxyglucose (2-DOG) uptake by isolated strips of soleus muscle from normal Wistar 
rats was measured using a modification of a previously described protocol (Burant et al. 9 
1984). Following a 30 min pre-incubation in KRB containing no glucose, 2 mM 
pyruvate, 10 mM HEPES, 0.2% BSA and ALLM or E-64-d, the muscle strips were 
transferred to identical medium containing 0.1 mM 2-deoxy-[2,6- 3 H]glucose (0.5 nCi/ml) 
10 and 0.1 mM [ 14 C]-sucrose (0.2 jiCi/ml) and incubated for another 30 min at 37°C. 
Muscles were then extracted and 2-DOG uptake calculated as previously described 
(Burant et al, 1984). 

Adipocytes were isolated from epididymal fat pads of 3 month old male Wistar 
15 rats as described previously (Robdell, 1964) with the following modification: fat pads 
were minced to 1-2 mm pieces and incubated for -25 min at 37°C with collagenase I (1 
mg/ml, Worthington Biochemicals, Lakewood, NJ) in KRB containing 10 mM Hepes 
(pH 7.4), 0.2% BSA and 2 mM sodium pyruvate. The cell suspension was filtered 
through a nylon mesh (134 joM, Spectrum Lab., Laguna Hills, CA) washed three times by 
20 floating and allowed to rest for 45 min in the KRB. For the measurement of basal and 
insulin-stimulated transport of glucose, aliquots of 200 |il of adipocytes (2 x 10 5 cells/ml) 
were incubated for 120 min at 37°C in KRB with different concentrations of insulin 
either with or without 100 M ALLM or E-64-d. Then another 50 jil of KRB containing 5 
mM 2-DOG (final concentration 1 mM), 0.5 ^iCi of 2-deoxy-[2,6- 3 H]glucose was added 
25 and cells were incubated for a further 5 min at 25°C. The transport was stopped by 
adding cytochalasin B (final concentration 50 fiM) and cells were spun through 250 pi of 
dinonyl phthalate oil (Fisher Scientific, Pittsburgh, PA). Cells were then transferred to 
scintillation vials for counting. 
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Assay methods. Insulin concentrations were measured by a double antibody 
radioimmunoassay using a rat insulin standard. The intra-assay coefficient of variation 
for this technique is 7%. All samples were assayed in duplicate. 

5 Statistical analysis. Results are expressed as mean ( SEM. In each experimental 

protocol summary measures of the experimental response e.g. areas under the insulin, 
NAD(P)H or [Ca 2+ ]i response curves were compared in the presence and absence of 
calpain inhibitor. The statistical significance of differences in the presence and absence 
of the inhibitor was assessed at the 5% level using the non-paired student's t-test, paired 

10 t-test, ANOVA or Wilcoxon rank sum test where appropriate. 



Results 

ALLM (250 ^M) and E-64-d (200 ^M) increased the insulin secretory responses 
15 to 20 mM glucose in isolated pancreatic islets by 1.97±0.3-fold (n=5, p<0.01, 
(mean±SEM)) and 1.77±0. 1-fold (n=6, p<0.001), respectively (FIG. 8A and FIG. 8B). 
These effects were not observed at 2 mM glucose. The effects of ALLM and E-64-d on 
the insulin secretory response to 20 mM glucose (FIG. 8C and FIG. 8D) were seen at 
inhibitor concentrations greater than 100 |iM and were glucose dependent in that the 
20 insulin secretory response was enhanced at glucose concentrations above 8 mM glucose 
but significant effects were not observed at 2,4 or 6 mM glucose (FIG.9A). The 
enhancement of the insulin secretory response to 20 mM glucose by ALLM and E-64-d 
was also observed in a dynamic islet perifusion system (FIG. 9B). ALLM produced a 
small but statistically significant increase in the insulin secretory response to 50 nM GLP- 
25 1 (1.55±0.2-fold, n=6, p<0.05), an agent which stimulates adenyl cyclase. ALLM did not 
however significantly increase the insulin secretory responses to 30 mM KC1, an agent 
which directly depolarizes the p-cell (FIG.9C) or 100 ^M carbachol (CCh) which 
mobilizes Ca 2+ from intracellular stores. 



30 Membrane capacitance measurements confirm the large enhancement of insulin 

secretion observed after ALLM pre-treatment (FIG. 10). Representative capacitance 
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changes, elicited by a train of depolarizations, from control (top) and ALLM pre-treated 
cells (bottom) are shown in FIG. 10A. Stimulation induced much larger average changes 
in membrane capacitance in ALLM pre-treated cells in comparison to control cells 
(FIG. 10B). 

5 

The stimulatory effects of ALLM and E-64-d on insulin secretion in response to 
high glucose were not associated with increases in [Ca 2+ ]i (FIG. 1 1 A and FIG. 1 IB). This 
observation was confirmed by the observation that calcium currents were similar in 
control and ALLM treated cells (FIG. 11C); no differences in amplitude or kinetics were 
10 apparent. In addition, no shifts in voltage-dependence were observed and the average 
peak calcium current density obtained in control and ALLM pre-treated cells were 
comparable (FIG. 11D). 

Rates of glucose utilization at basal (2 mM glucose) and stimulatory glucose 
15 concentrations (20 mM) in the presence of 100 ^M ALLM (14.5±3.6 and 89.5±3.0 
pmol/islet/hr, respectively) or 200 ^iM E-64-d (15.5+4 and 79.5±9.5 pmol/islet/hr 
respectively) were not significantly different from those in islets incubated in their 
absence (14.5±2.1 and 76.5±6.5 pmol/islet/hr, n=3 in each case). Similarly, there was no 
significant difference in the glucose oxidation rates at basal or stimulatory glucose 
20 concentrations in the presence of ALLM (6.0±0.7 and 39.5±4.1 pmol/islet/hr) and E-64-d 
(5.2±0.4 and 40.0±2.4) compared to those measured in the absence of inhibitor (4.4±0.8 
and 32.5±6.5 pmol/islet/hr, n=3 in each case). Consistent with a lack of effect of ALLM 
and E-64-d on (3-cell glucose metabolism, the NAD(P)H response to an increase in the 
glucose concentration from 2 to 14 mM in the presence of 100 \iM ALLM (2.7±0.4-fold 
25 increase, n=4) and E-64-d (2.8±0.3-fold increase, n=2) was not significantly different 
from controls (2.6±0.2-fold increase, FIG. 1 IE and FIG. 1 IF). 

In order to document that ALLM and E-64-d were indeed inhibiting calpains 
rather than other cysteine proteases, calpain activity was measured in isolated islets using 
30 the fluorogenic calpain specific substrate Boc-Leu-Met-CMAC (FIG. 12). Although this 
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compound does not allow us to distinguish between different calpain isozymes, it does 
appear to be a substrate for calpains and not for other lysosomal proteases under 
physiological conditions. In islets incubated in the presence of 200 jiM ALLM or 200 
E-64-d, the rate of generation of the fluorescent signal was lower than in islets 
5 incubated in the absence of the calpain inhibitors. The area under the curve measuring 
the rate of generation of the fluorescent product was reduced to 35±4% (n=3, p<0.05) and 
45±5% (n=4, p<0.05) of control values in the presence of ALLM (200 jiM) and of E-64-d 
(200 jaM) respectively. 

10 The inventors also examined the effects of other protease inhibitors on insulin 

secretion. Insulin secretory responses to 20 mM glucose were not altered in the presence 
of pepstatin A (100|iM), an aspartic protease inhibitor, or Cathepsin B inhibitor 2 
(100^M), a lysosomal cysteine protease inhibitor, indicating that the inhibitory effects of 
ALLM and E-64-d on insulin secretion are not seen with all protease inhibitors. 

15 

Since decreased insulin action in peripheral tissues defines insulin resistance and 
is a prominent feature of type 2 diabetes, we determined whether ALLM and E-64-d 
affected insulin stimulated 2-deoxyglucose (2-DOG) uptake in muscle and fat cells. The 
uptake of 2-DOG into normal rat adipocytes (FIG. 13Aas increased approximately 3-fold 

20 from 456.5±59 pmol/2 x 10 5 cells/5 min (n=6) to 1384+178 pmol/2 x 10 5 cells/5 min 
(p<0.05, n=4) by the addition of insulin (12 nmol/L). However in the presence of 100 
jaM ALLM, insulin failed to increase 2-DOG uptake into adipocytes significantly 
(598±102 vs. 751±71 pmol/2 x 10 5 cells/5 min, n=4, p>0.05). Similarly, in the presence 
of 200 ^iM E-64-d, insulin failed to increase 2-DOG uptake into adipocytes significantly 

25 (361±29 vs. 749±129 pmol/2 x 10 5 cells/5 min, n=4, p>0.05). , 

Insulin mediated glucose transport into strips of soleus muscle was also reduced 
by 100 jiM ALLM or 200 ^M E-64-d (FIG. 13B). Insulin (12 nM) increased 2- 
deoxyglucose uptake into rat soleus muscle strips from 0.26±0.01 to 0.47±0.03 (mol/ml 
30 H 2 0/30 mins, p<0.05, n=5). However in the presence of ALLM (0.28±0.04 vs. 0.34±0.05 
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(mol/ml H 2 0/30 mins, n=5, p>0.05) or E-64-d (0.3110.02 vs. 0.36±0.02 (mol/ml H 2 0/30 
mins, n=5, p>0.05) insulin failed to stimulate a significant increase in muscle glucose 
uptake. 

5 Rates of glycogen synthesis were measured in soleus muscle strips (FIG. 13C). 

Insulin (6 nM) increased the rate of muscle glycogen synthesis from 0.58±0.08 to 
1.55±0.20 nmol glucose/mg/hr (n=6, p<0.005). In the presence of 100 jjM ALLM 
(0.27±0.03 vs. 0.4010.05 nmol glucose/mg/hr, n=6, p<0.01) and 200 jiM E-64-d 
(0.49±0.08 vs. 0.8010.14 nmol glucose/mg/hr, n=6, p<0.04), insulin caused a significant 
10 increase in muscle glycogen synthesis. However the magnitude of the increase was 
significantly lower in the presence of both ALLM (0.13±0.03 nmol glucose/mg/hr, 
p<0.01) and E-64-d (0.3110.09 nmol glucose/mg/hr) than in islets not exposed to these 
inhibitors (0.9710.19 nmol glucose/mg/hr, p<0.01). 

15 The specific calpain isozyme(s) or cysteine protease(s) implicated in the control of 

insulin secretion and insulin action in the studies described above is unknown. Isozyme- 
specific inhibitors are not available and ALLM and E-64-d inhibit both calpains and 
cathepsins. However, the inhibition of hydrolysis of the substrate Boc-Leu-Met-CMAC 
by ALLM and E-64-d in pancreatic islets supports the hypothesis that ALLM and E-64-d 

20 increase insulin secretion by inhibiting calpain activity rather than affecting lysosomal 
cysteine proteases such as the cathepsins. The identification of the specific calpain(s) 
involved must await the development of more specific inhibitors. The concordance of the 
present results with those from molecular genetic and clinical studies showing a role for 
calpain 10 in the development of type 2 diabetes and insulin resistance suggests that this 

25 calpain isozyme is important in mediating the observed effects. 

The present studies also provide insight into the molecular mechanism by which 
ALLM and E-64-d increase the insulin secretory responses to glucose and GLP-1. These 
agents did not lead to an increase in [Ca 2+ ]i, rates of glucose oxidation and utilization, or 
30 NAD(P)H generation. Thus, they do not affect pathways in the p-cell responsible for the 

157 

A:230957<4Y7H01!.DOQ 



uptake and metabolism of glucose. Rather, the inventors believe that the most likely 
site(s) of action are in pathways that regulate the movement or fusion of insulin secretory 
granules with the plasma membrane. 



5 In addition to a role in insulin secretion, the inventors have demonstrated that 

calpain inhibition results in reduced insulin stimulated glucose transport into fat and 
muscle and reduced muscle glycogen synthesis and thus reproduces the defects in insulin 
action that are the hallmarks of insulin resistant states including type 2 diabetes. Taken in 
conjunction with genetic and physiological studies showing that a common 

10 polymorphism in calpain 10 is associated with an increased risk of type 2 diabetes, 
decreased muscle mRNA levels and insulin resistance, these findings provide additional 
support for the notion that calpains play an important role in the regulation of insulin 
action, perhaps by downregulating IRS-1 or promoting adipocyte differentiation. It is 
interesting to note that insulin resistance in muscle and fat is commonly associated with 

15 hypersecretion of insulin in subjects predisposed to the later development of type 2 
diabetes. Alterations in calpain expression and/or calpain activity in diverse tissues may 
therefore represent a common unifying pathogenetic mechanism for the development of 
type 2 diabetes that accounts for both insulin resistance and the resulting compensatory 
increase in insulin secretion. 

20 EXAMPLE 9 

Long-term Effects of Calpain Inhibition on Beta Cell Function 

Insulin secretory responses to glucose in islets that had been treated with calpain 
inhibitor 2 (AT J M) and E-64-d were measured. As shown in FIG. 14, 48 hours exposure 
to 100 jjM of ALLM or 200 jiM of E64-d attenuated the insulin secretory response to 20 

25 mM glucose by approximately 50-60% relative to islets treated with vehicle. There was 
no significant difference in the basal insulin secretion (at 2 mM glucose) between 
inhibitor- and control-treated islets. Also, the insulin content in islets treated for 48 hours 
with the two inhibitors were comparable to that in control islets (FIG. 15). In 
experiments performed to document the dose response relationship between calpain 

30 inhibitor II concentration and inhibition of insulin secretion, inhibition was achieved 
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with 25 |iM of ALLM (FIG. 16). The long-term inhibitory effects of calpain inhibitors on 
glucose-induced insulin secretion were also demonstrated in a dynamic perifusion system 
(FIG. 17). To confirm the viability of islets treated with the cysteine protease inhibitors, 
we tested the reversibility of the inhibitory effect of ALLM and E-64-d on insulin 

5 secretion. Islets were treated with 100 \jM ALLM or 200 ^M E-64-d for 48 h, and then 
cultured for a further 48 h either in the presence or absence of the inhibitors. In this set of 
experiments, glucose-induced insulin secretion (20 mM) was inhibited by more than 80% 
in islets treated with ALLM or E-64-d for 96 h. In contrast, those islets that had been 
allowed to recover for 2 days following 48-h treatment with the inhibitors exhibited an 

10 essentially normal insulin secretory response to 20 mM glucose (FIG. 18). In conjunction 
with the normal insulin contents in 48-hr treated islets, these data exclude the possibility 
of cell death or non-specific toxic effects resulting from 48 h treatment with the cysteine 
protease inhibitors. 

15 To further characterize the defect in glucose induced insulin secretion, insulin 

secretory responses to secretagogues that enter the signal transduction pathway at 
different levels were studied. As shown in FIG. 19, ALLM or E-64-d treated islets 
responded normally to glyceraldehyde. However the insulin-secretory responses to keto- 
isocaproic acid (KIC, a nutrient that stimulates mitochondrial metabolism directly (FIG. 

20 19)) were decreased. The insulin secretory response to 30 mM KC1, which directly 
depolarizes the p-cell membrane, was significantly reduced in E-64-d treated islets (FIG. 
19). Insulin secretory responses to mastoparan, a G-protein activator known to be a 
potent stimulator of secretion, and carbachol, a muscarinic agonist that stimulates insulin 

2+ 

secretion through activation of phospholipase C and release of intracellular Ca stores, 
25 were attenuated in calpain-inhibitor treated islets were attenuated in islets that had been 
treated for 48 h with ALLM or E-64-d (FIG. 20). 

Due to their lack of the specificity for calpain, (ALLM and E-64-d may also 
inhibit cathepsins and other proteases, such as those of proteasome), the effects of 
30 additional protease inhibitors on insulin secretion were tested. As listed in Table 8, 

159 

A:230957(4Y7H01!.DOQ 



treatment of mouse islets with 100 \jM of ALLN (calpain inhibitor I, is a small peptide 
inhibitor of calpain structurally similar to ALLM) for 48h inhibited 20 mM glucose- 
stimulated insulin secretion by 88 ± 9% (P<0.001, N=4). A similar result was obtained 
with MDL28170 (another cell-permeable peptide calpain inhibitor) - 48 hr exposure to 

5 50 nM MDL28170 inhibited insulin secretion by approximately 60% (P<0.05, N=4). 
Therefore, these two different calpain inhibitors were equally effective in blocking insulin 
secretion as ALLM and E-64d. In contrast, culturing islets with 100 |iM Cathepsin B 
Inhibitor II (a small peptide, inhibitor of cathepsin B) or 20 ^iM Lactacystin (a 
Streptomyces metabolite, which is a specific cell-permeable, irreversible inhibitor of 

10 proteasome) for 48 h did not significantly affect either basal or glucose-stimulated insulin 
secretion, indicating that inhibition of the activities of cathepsin B and proteasome are 
unlikely to be the cause of defective insulin secretion associated with long term treatment 
of ALLM or E-64-d. 
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Table 8. Glucose-induced insulin secretion in islets treated with different protease 
inhibitors for 48 h. 



10 



Inhibitors 


OiM) 


N 


Insulin secretion ( 


% of control treated islets) 








2 mM glucose 


20 mM glucose 


ALLM 


100 


5 


136 ±25 


31 ±4 


ALLN 


100 


3 


110±8 


12±2 


MDL28170 


100 


4 


111+35 


41 + 18 


Cath B Inhibitor II 


50 


3 


114 ± 19 


89 ±25 


Lactacystin 


20 


3 


88 ±15 


95 ±13 



Insulin secretory responses to glucose and most other secretagogues is mediated 
by a rise in intracellular free calcium ([Ca 2+ ]i ). We therefore measured [Ca 2+ ]i responses 
to glucose, KIC and KC1 using Fura-2 as the Ca 2+ indicator. In comparison to the 

15 responses from the control islets, the most prominent abnormality in [Ca 2+ ]i responses to 
glucose and KIC was a delay of the [Ca 2+ ]i responses (FIG. 21). The mean time interval 
between administration of 14 mM glucose and the point of half maximal response (Jm-) 
was 120 ± 14 seconds in control islets. The Ti/ 2 of [Ca 2+ ]i responses to glucose in ALLM 
and E-64-d treated islets were significantly delayed to 319 ± 42 and 265 ± 65 sec. 

20 respectively (P<0.001 for both groups, n=5). The [Ca 2+ ]i responses to KIC in ALLM and 
E-64-d treated islets were also delayed (251 ± 42 and 330 ± 7 seconds respectively) 
compared to control islets (125 ± 9, P<0.001 for each group). In addition to the delay in 
[Ca 2+ ]i responses to the two nutrients, the integrated [Ca 2+ ]i responses in the inhibitor- 
treated islets, calculated as the area under the curves of the [Ca ]> responses, were 

25 significantly smaller than that of the control islets (FIG. 21). The diminished [Ca 2+ ]i 
response to glucose in ALLM and E-64-d treated islets was also documented in ramp 
experiments in which a gradually increasing level of glucose (from 2 to 26 mM) over 48 
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mins was applied to islets while changes in [Ca + ]i were monitored. The [Ca ]i responses 
to 30 mM KC1 were not different between control and ALLM or E-64-d treated islets. 
There was no delay in the appearance of [Ca 2+ ]i response to KC1, nor was the magnitude 
of the response reduced. 

5 

The attenuated insulin secretory and [Ca 2+ ]i responses to glucose and KIC 
suggests a possible defect in glucose metabolism, more specifically in mitochondrial 
metabolism. Therefore glucose metabolism in islets that had been treated with ALLM and 
E-64-d for 48 h was measured. As depicted in FIG. 22, no significant changes were 
10 observed in rate of basal glucose utilization and oxidation at 3.3 mM glucose in islets 
treated with either inhibitor. However, the rates of glycolysis and glucose oxidation at 
stimulating concentrations of glucose were significantly reduced in ALLM or E-64-d 

fS% treated islets compared to the controls. This is again distinct from the acute treatment, 

where rates of glucose utilization and oxidation were not changed by the 4-h treatment 

CH 15 with the inhibitors. 

I s * As an additional measurement of glucose metabolism, we monitored NAD(P)H 

= autofluorescence changes in responses to glucose and KIC. NAD(P)H responses to 

12 glucose were comparable in control and treated islets, whereas the responses to KIC in 

!H 20 ALLM or E-64-d treated islets were significantly reduced in comparison with control 
C3 islets (FIG. 23). Unlike the [Ca 2+ ]i responses, there was no significant delay in the onset 

of NAD(P)H responses to glucose and KIC. 

In a previous work, we have demonstrated that in a short-term incubation 
25 condition, ALLM and E-64-d enhanced insulin secretion via a direct activation of the 
exocytosis of insulin. After 4 days culture with the same two calpain inhibitors, 
significantly lower rates of exocytosis in p-cells were demonstrated (FIG. 24). Moreover, 
significantly enlarged vesicles were observed in ALLM and E-64-d treated p-cells stained 
with 1 jiM quinacrin (FIG. 25). Quinacrin is a dye specifically partitioned to acidic 
30 vesicles. Using confocal microscopy, control treated p-cell were found to contain a large 
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number of vesicles of around 100 micrometer in size (FIG. 25). In ALLM and E-64-d- 
treated (48 h) cells, the size of those quinacrin-stained vesicles was increased more than 
3-fold (FIG. 25). Together with the capacitance measurement and insulin secretion assay, 
these data indicate that long-term inhibition of cysteine proteases have a direct impact on 
5 the exocytotic machinery of the P-cells. 

Following 48 h treatment with 100 yM ALLM or 200 jiM E-64-d, the 
residual calpain activity in intact islets, determined by monitoring the cleavage of a 
specific fluorogenic substrate, was 54 ± 3% and 55 ± 4% of control treated islets (FIG. 
10 26). 

EXAMPLE 10 

Use of Animal Models to Deduce the Mechanisms Causing Impairment 
of Insulin Function in Persons having the GG Phenotype 

15 

Transgenic models will be created in mice in which calpain proteases and 
particularly calpain 10 containing the variant GG at UCSNP-43 will be overexpressed in 
tissues relevant to diabetes (muscle, liver and the pancreatic beta cell). Experiments will 
be performed to determine if this targeted tissue overexpression of calpains results in 

20 dysfunction of the target tissue, e.g., reduced glucose induced insulin secretion, insulin 
resistance, increased hepatic glucose production. Embryonic stem cell technology will be 
used to eliminate specific forms of the calpains either in the whole animal or in the 
specific tissues listed above. Physiological studies in the animals will characterize 
alterations that occur in each of these target tissues resulting from a lack of calpain 

25 expression. 

In addition, experiments will be performed to determine if altered calpain 
expression and/or action is playing a role in the pathophysiology of existing models of 
type 2 diabetes, i.e., the ob/ob mouse, the db/db mouse and the ZDF rat (Baetens et al, 
30 1978, Coleman, 1979, and Friedman et al, 1991). 

163 

A:230957(4Y7H0l!.DOC) 



EXAMPLE 11 

Use of Calpain Inhibitors in Animals and Humans to Treat Diabetes 

The present example describes methods of treating diabetes by modulating the 
5 function of one or more calpains in at least one of a p-cell, muscle cell, or fat cell with a 
modulator of calpain function. A preffered embodiment would be a method of treating 
diabetes comprising stimulating calpain activity in a fat call or muscle cell with a 
modulator of calpain function and inhibiting calpain activity in a P-cell with a modulator 
of calpain function. 

10 

Calpain modulators, such as those described in this application, can be 
administered to animals models of diabetes, including the existing models of type 2 
diabetes, the ob/ob mouse, the db/db mouse and the ZDF rat (Baetens et al t 1978, 
Coleman, 1979, and Friedman et al, 1991). These modulators can also be administered 

15 to transgenic animals, such as those described in Example 9. These modulators can be 
formulated and administered by any of the means described in this application to better 
deliniate optimal dosages, routes of delivery, formulations and so on. Physiological 
studies in the animals will characterize effects of the calpain modulators on varios 
parameters, including measurements of glucose induced insulin secretion, insulin 

20 resistance, and hepatic glucose production. Modulators that have anti-diabetic effects in 
these animal models are candidates for further animal experimentation and eventual 
human clinical trials. 

As experimental animal models and other systems are developed for testing 
25 calpain modulators, novel modulators with improved bioactivity can be developed. 
Improved bioactivty may be defined as optimizing half-life in vivo, preference for target 
cells, especially p-cells, muscle or fat, reduced side effects such as toxicity and 
immunogenicity, or any other measure of improved efficacy. These novel modulators can 
be developed by any means, including but not limited to, combinatorial libraries, random 
30 mutagenesis or modifications, and rational drug designs. 
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Lead compounds having calpain modulating activity and efficacy in animal 
diabetic models are candidiate compounds for human clinical trials. Human clinical trials 
will necesitate further definition of optimal dosage, formulation and administration route. 
Human trials will further evaluate bioactivity, drug half-life, tissue specificity, toxicity 
and immunogenicity. Human trials will also define patient indications for treatment with 
calpain inhibitors as well as define combination therapies of calpain modulators with 
existing or new drugs aimed at treating diabetes. 

* * * 

All of the compositions and methods disclosed and claimed herein can be made 
and executed without undue experimentation in light of the present disclosure. While the 
compositions and methods of this invention have been described in terms of preferred 
embodiments, it will be apparent to those of skill in the art that variations may be applied 
to the compositions and methods and in the steps or in the sequence of steps of the 
method described herein without departing from the concept, spirit and scope of the 
invention. More specifically, it will be apparent that certain agents which are both 
chemically and physiologically related may be substituted for the agents described herein 
while the same or similar results would be achieved. All such similar substitutes and 
modifications apparent to those skilled in the art are deemed to be within the spirit, scope 
and concept of the invention as defined by the appended claims. 
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